Re: [PATCH] powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y

2023-06-09 Thread Randy Dunlap
Hi--

On 5/16/23 11:54, Pali Rohár wrote:
> On Tuesday 16 May 2023 08:28:54 Randy Dunlap wrote:
>> In a randconfig with CONFIG_SERIAL_CPM=m and
>> CONFIG_PPC_EARLY_DEBUG_CPM=y, there is a build error:
>> ERROR: modpost: "udbg_putc" [drivers/tty/serial/cpm_uart/cpm_uart.ko] 
>> undefined!
>>
>> Prevent the build error by allowing PPC_EARLY_DEBUG_CPM only when
>> SERIAL_CPM=y.
>>
>> Fixes: c374e00e17f1 ("[POWERPC] Add early debug console for CPM serial 
>> ports.")
>> Signed-off-by: Randy Dunlap 
>> Cc: Scott Wood 
>> Cc: Kumar Gala 
>> Cc: "Pali Rohár" 
>> Cc: Michael Ellerman 
>> Cc: Nicholas Piggin 
>> Cc: Christophe Leroy 
>> Cc: linuxppc-dev@lists.ozlabs.org
> 
> Looks good,
> 
> Reviewed-by: Pali Rohár 

I'm still seeing this build error in linux-next even with other (PPC) CPM
patches applied.

> 
>> ---
>>  arch/powerpc/Kconfig.debug |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff -- a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
>> --- a/arch/powerpc/Kconfig.debug
>> +++ b/arch/powerpc/Kconfig.debug
>> @@ -240,7 +240,7 @@ config PPC_EARLY_DEBUG_40x
>>  
>>  config PPC_EARLY_DEBUG_CPM
>>  bool "Early serial debugging for Freescale CPM-based serial ports"
>> -depends on SERIAL_CPM
>> +depends on SERIAL_CPM=y
>>  help
>>Select this to enable early debugging for Freescale chips
>>using a CPM-based serial port.  This assumes that the bootwrapper

-- 
~Randy


Re: [PATCH v4 1/2] powerpc/legacy_serial: Handle SERIAL_8250_FSL=n build failures

2023-06-09 Thread Randy Dunlap



On 6/9/23 06:39, Uwe Kleine-König wrote:
> With SERIAL_8250=y and SERIAL_8250_FSL_CONSOLE=n the both
> IS_ENABLED(CONFIG_SERIAL_8250) and IS_REACHABLE(CONFIG_SERIAL_8250)
> evaluate to true and so fsl8250_handle_irq() is used. However this
> function is only available if CONFIG_SERIAL_8250_CONSOLE=y (and thus
> SERIAL_8250_FSL=y).
> 
> To prepare SERIAL_8250_FSL becoming tristate and being enabled in more
> cases, check for IS_REACHABLE(CONFIG_SERIAL_8250_FSL) before making use
> of fsl8250_handle_irq(). This check is correct with and without the
> change to make SERIAL_8250_FSL modular.
> 
> Reported-by: Randy Dunlap 
> Fixes: 66eff0ef528b ("powerpc/legacy_serial: Warn about 8250 devices operated 
> without active FSL workarounds")
> Signed-off-by: Uwe Kleine-König 

Acked-by: Randy Dunlap 
Tested-by: Randy Dunlap  # build-tested

Thanks.

> ---
>  arch/powerpc/kernel/legacy_serial.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/legacy_serial.c 
> b/arch/powerpc/kernel/legacy_serial.c
> index fdbd85aafeb1..6ee65741dbd5 100644
> --- a/arch/powerpc/kernel/legacy_serial.c
> +++ b/arch/powerpc/kernel/legacy_serial.c
> @@ -510,7 +510,7 @@ static void __init fixup_port_irq(int index,
>  
>   if (IS_ENABLED(CONFIG_SERIAL_8250) &&
>   of_device_is_compatible(np, "fsl,ns16550")) {
> - if (IS_REACHABLE(CONFIG_SERIAL_8250)) {
> + if (IS_REACHABLE(CONFIG_SERIAL_8250_FSL)) {
>   port->handle_irq = fsl8250_handle_irq;
>   port->has_sysrq = 
> IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
>   } else {

-- 
~Randy


Re: [RFC PATCH v2 5/6] KVM: PPC: Add support for nested PAPR guests

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 7:09 PM Nicholas Piggin  wrote:
[snip]
>
> You lost your comments.

Thanks

>
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> > b/arch/powerpc/include/asm/kvm_book3s.h
> > index 0ca2d8b37b42..c5c57552b447 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -12,6 +12,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  struct kvmppc_bat {
> >   u64 raw;
> > @@ -316,6 +317,57 @@ long int kvmhv_nested_page_fault(struct kvm_vcpu 
> > *vcpu);
> >
> >  void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
> >
> > +
> > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > +
> > +extern bool __kvmhv_on_papr;
> > +
> > +static inline bool kvmhv_on_papr(void)
> > +{
> > + return __kvmhv_on_papr;
> > +}
>
> It's a nitpick, but kvmhv_on_pseries() is because we're runnning KVM-HV
> on a pseries guest kernel. Which is a papr guest kernel. So this kind of
> doesn't make sense if you read it the same way.
>
> kvmhv_nested_using_papr() or something like that might read a bit
> better.

Will we go with kvmhv_using_nested_v2()?

>
> This could be a static key too.

Will do.

>
> > @@ -575,6 +593,7 @@ struct kvm_vcpu_arch {
> >   ulong dscr;
> >   ulong amr;
> >   ulong uamor;
> > + ulong amor;
> >   ulong iamr;
> >   u32 ctrl;
> >   u32 dabrx;
>
> This belongs somewhere else.

It can be dropped.

>
> > @@ -829,6 +848,8 @@ struct kvm_vcpu_arch {
> >   u64 nested_hfscr;   /* HFSCR that the L1 requested for the nested 
> > guest */
> >   u32 nested_vcpu_id;
> >   gpa_t nested_io_gpr;
> > + /* For nested APIv2 guests*/
> > + struct kvmhv_papr_host papr_host;
> >  #endif
>
> This is not exactly a papr host. Might have to come up with a better
> name especially if we implement a L0 things could get confusing.

Any name ideas? nestedv2_state?

>
> > @@ -342,6 +343,203 @@ static inline long 
> > plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
> >   return rc;
> >  }
> >
> > +static inline long plpar_guest_create(unsigned long flags, unsigned long 
> > *guest_id)
> > +{
> > + unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
> > + unsigned long token;
> > + long rc;
> > +
> > + token = -1UL;
> > + while (true) {
> > + rc = plpar_hcall(H_GUEST_CREATE, retbuf, flags, token);
> > + if (rc == H_SUCCESS) {
> > + *guest_id = retbuf[0];
> > + break;
> > + }
> > +
> > + if (rc == H_BUSY) {
> > + token = retbuf[0];
> > + cpu_relax();
> > + continue;
> > + }
> > +
> > + if (H_IS_LONG_BUSY(rc)) {
> > + token = retbuf[0];
> > + mdelay(get_longbusy_msecs(rc));
>
> All of these things need a non-sleeping delay? Can we sleep instead?
> Or if not, might have to think about going back to the caller and it
> can retry.
>
> get/set state might be a bit inconvenient, although I don't expect
> that should potentially take so long as guest and vcpu create/delete,
> so at least those ones would be good if they're called while
> preemptable.

Yeah no reason not to sleep except for get/set, let me try it out.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 521d84621422..f22ee582e209 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -383,6 +383,11 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu 
> > *vcpu)
> >   spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags);
> >  }
> >
> > +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
> > +{
> > + vcpu->arch.pvr = pvr;
> > +}
>
> Didn't you lose this in a previous patch? I thought it must have moved
> to a header but it reappears.

Yes, that was meant to stay put.

>
> > +
> >  /* Dummy value used in computing PCR value below */
> >  #define PCR_ARCH_31(PCR_ARCH_300 << 1)
> >
> > @@ -1262,13 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >   return RESUME_HOST;
> >   break;
> >  #endif
> > - case H_RANDOM:
> > + case H_RANDOM: {
> >   unsigned long rand;
> >
> >   if (!arch_get_random_seed_longs(&rand, 1))
> >   ret = H_HARDWARE;
> >   kvmppc_set_gpr(vcpu, 4, rand);
> >   break;
> > + }
> >   case H_RPT_INVALIDATE:
> >   ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
> > kvmppc_get_gpr(vcpu, 5),
>
> Compile fix for a previous patch.

Thanks.

>
> > @@ -2921,14 +2927,21 @@ static int kvmppc_core_vcpu_create_hv(struct 
> > kvm_vcpu *vcpu)
> >   vcpu->arch.shared_big_endian = false;
> >  #endif
> >  #endif
> > - kvmppc_set_mmcr_hv(vcpu, 0, MMCR0_FC);
> >
> > + if (kvmhv_on_papr()) {
> > + 

Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin  wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

 * GET state hcalls
 * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE 0x0001 /* Size of Hypervisor Internal 
> > Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE 0x0002 /* Minimum size of the Run 
> > VCPU output buffer */
> > +#define GSID_LOGICAL_PVR 0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET   0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE 0x0005 /* Partition Scoped Page Table 
> > */
> > +#define GSID_PROCESS_TABLE   0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT   0x0C00 /* Run VCPU Input 
> > Buffer */
> > +#define GSID_RUN_OUTPUT  0x0C01 /* Run VCPU Out Buffer 
> > */
> > +#define GSID_VPA 0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB  0x1020
> > +#define GSID_NIA 0x1021
> > +#define GSID_MSR 0x1022
> > +#define GSID_LR  0x1023
> > +#define GSID_XER 0x1024
> > +#define GSID_CTR 0x1025
> > +#define GSID_CFAR0x1026
> > +#define GSID_SRR00x1027
> > +#define GSID_SRR10x1028
> > +#define GSID_DAR 0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar  GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

 gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > + return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > + val

Re: [RFC PATCH v2 2/6] KVM: PPC: Add fpr getters and setters

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 5:56 PM Nicholas Piggin  wrote:
[snip]
>
> Is there a particular reason some reg sets are broken into their own
> patches? Looking at this hunk you think the VR one got missed, but it's
> in its own patch.
>
> Not really a big deal but I wouldn't mind them all in one patch. Or at
> least the FP/VR/VSR ine one since they're quite regular and similar.

There's not really a reason,

Originally I had things even more broken apart but then thought one
patch made
more sense. Part way through squashing the patches I had a change of
heart
and thought I'd see if people had a preference.

I'll just finish the squashing for the next series.

Thanks,
Jordan
>
> Thanks,
> Nick
>


Re: [RFC PATCH v2 1/6] KVM: PPC: Use getters and setters for vcpu register state

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 5:53 PM Nicholas Piggin  wrote:
[snip]
>
> The general idea is fine, some of the names could use a bit of
> improvement. What's a BOOK3S_WRAPPER for example, is it not a
> VCPU_WRAPPER, or alternatively why isn't a VCORE_WRAPPER Book3S
> as well?

Yeah the names are not great.
I didn't call it VCPU_WRAPPER because I wanted to keep separate
BOOK3S_WRAPPER for book3s registers
HV_WRAPPER for hv specific registers
I will change it to something like you suggested.

[snip]
>
> Stray hunk I think.

Yep.

>
> > @@ -957,10 +957,32 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu 
> > *vcpu, u##size val) \
> >  vcpu->arch.shared->reg = cpu_to_le##size(val);   \
> >  }\
> >
> > +#define SHARED_CACHE_WRAPPER_GET(reg, size)  \
> > +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  
> >   \
> > +{\
> > + if (kvmppc_shared_big_endian(vcpu)) \
> > +return be##size##_to_cpu(vcpu->arch.shared->reg);\
> > + else\
> > +return le##size##_to_cpu(vcpu->arch.shared->reg);\
> > +}\
> > +
> > +#define SHARED_CACHE_WRAPPER_SET(reg, size)  \
> > +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
> >   \
> > +{\
> > + if (kvmppc_shared_big_endian(vcpu)) \
> > +vcpu->arch.shared->reg = cpu_to_be##size(val);   \
> > + else\
> > +vcpu->arch.shared->reg = cpu_to_le##size(val);   \
> > +}\
> > +
> >  #define SHARED_WRAPPER(reg, size)\
> >   SHARED_WRAPPER_GET(reg, size)   \
> >   SHARED_WRAPPER_SET(reg, size)   \
> >
> > +#define SHARED_CACHE_WRAPPER(reg, size)
> >   \
> > + SHARED_CACHE_WRAPPER_GET(reg, size) \
> > + SHARED_CACHE_WRAPPER_SET(reg, size) \
>
> SHARED_CACHE_WRAPPER that does the same thing as SHARED_WRAPPER.

That changes once the guest state buffer IDs are included in a later
patch.

>
> I know some of the names are a but crufty but it's probably a good time
> to rethink them a bit.
>
> KVMPPC_VCPU_SHARED_REG_ACCESSOR or something like that. A few
> more keystrokes could help imensely.

Yes, I will do something like that, for the BOOK3S_WRAPPER and
HV_WRAPPER
too.

>
> > diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
> > b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > index 34f1db212824..34bc0a8a1288 100644
> > --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> > @@ -305,7 +305,7 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, 
> > struct kvm_vcpu *vcpu, u6
> >   u32 pid;
> >
> >   lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> > - pid = vcpu->arch.pid;
> > + pid = kvmppc_get_pid(vcpu);
> >
> >   /*
> >* Prior memory accesses to host PID Q3 must be completed before we
>
> Could add some accessors for get_lpid / get_guest_id which check for the
> correct KVM mode maybe.

True.

Thanks,
Jordan

>
> Thanks,
> Nick
>


Re: [RFC PATCH v2 0/6] KVM: PPC: Nested PAPR guests

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 3:54 PM Nicholas Piggin  wrote:
>
> On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> > There is existing support for nested guests on powernv hosts however the
> > hcall interface this uses is not support by other PAPR hosts.
>
> I kind of liked it being called nested-HV v1 and v2 APIs as short and
> to the point, but I suppose that's ambiguous with version 2 of the v1
> API, so papr is okay. What's the old API called in this scheme, then?
> "Existing API" is not great after patches go upstream.

Yes I was trying for a more descriptive name but it is just more
confusing and I'm struggling for a better alternative.

In the next revision I'll use v1 and v2. For version 2 of v1
we now call it v1.2 or something like that?

>
> And, you've probably explained it pretty well but slightly more of
> a background first up could be helpful. E.g.,
>
>   A nested-HV API for PAPR has been developed based on the KVM-specific
>   nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API
>   had to break compatibility to accommodate implementation in other
>   hypervisors and partitioning firmware.
>
> And key overall differences
>
>   The control flow and interrupt processing between L0, L1, and L2
>   in the new PAPR API are conceptually unchanged. Where the old API
>   is almost stateless, the PAPR API is stateful, with the L1 registering
>   L2 virtual machines and vCPUs with the L0. Supervisor-privileged
>   register switching duty is now the responsibility for the L0, which
>   holds canonical L2 register state and handles all switching. This
>   new register handling motivates the "getters and setters" wrappers
>   ...

I'll include something along those lines.

Thanks,
Jordan

>
> Thanks,
> Nick
>


Re: [PATCH RFC v2 6/6] docs: powerpc: Document nested KVM on POWER

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 3:38 PM Gautam Menghani  wrote:
>
> On Mon, Jun 05, 2023 at 04:48:48PM +1000, Jordan Niethe wrote:
> > From: Michael Neuling 
>
> Hi,
> There are some minor typos in the documentation pointed out below

Thank you, will correct in the next revision.

Jordan
>
>
> > +H_GUEST_GET_STATE()
> > +---
> > +
> > +This is called to get state associated with an L2 (Guest-wide or vCPU 
> > specific).
> > +This info is passed via the Guest State Buffer (GSB), a standard format as
> > +explained later in this doc, necessary details below:
> > +
> > +This can set either L2 wide or vcpu specific information. Examples of
>
> We are getting the info about vcpu here : s/set/get
>
> > +H_GUEST_RUN_VCPU()
> > +--
> > +
> > +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> > +parameters. The vCPU run with the state set previously using
>
> Minor nit : s/run/runs
>
> > +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> > +hcall.
> > +
> > +This hcall also has associated input and output GSBs. Unlike
> > +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> > +parameters to the hcall (This was done in the interest of
> > +performance). The locations of these GSBs must be preregistered using
> > +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
> > +below).
> > +
> >
> > --
> > 2.31.1
> >
>


[PATCH v3 06/25] iommu/tegra-gart: Remove tegra-gart

2023-06-09 Thread Jason Gunthorpe
Thierry says this is not used anymore, and doesn't think it makes sense as
an iommu driver. The HW it supports is about 10 years old now and newer HW
uses different IOMMU drivers.

As this is the only driver with a GART approach, and it doesn't really
meet the driver expectations from the IOMMU core, let's just remove it
so we don't have to think about how to make it fit in.

It has a number of identified problems:
 - The assignment of iommu_groups doesn't match the HW behavior

 - It claims to have an UNMANAGED domain but it is really an IDENTITY
   domain with a translation aperture. This is inconsistent with the core
   expectation for security sensitive operations

 - It doesn't implement a SW page table under struct iommu_domain so
   * It can't accept a map until the domain is attached
   * It forgets about all maps after the domain is detached
   * It doesn't clear the HW of maps once the domain is detached
 (made worse by having the wrong groups)

Cc: Thierry Reding 
Cc: Dmitry Osipenko 
Acked-by: Thierry Reding 
Signed-off-by: Jason Gunthorpe 
---
 arch/arm/configs/multi_v7_defconfig |   1 -
 arch/arm/configs/tegra_defconfig|   1 -
 drivers/iommu/Kconfig   |  11 -
 drivers/iommu/Makefile  |   1 -
 drivers/iommu/tegra-gart.c  | 371 
 drivers/memory/tegra/mc.c   |  34 ---
 drivers/memory/tegra/tegra20.c  |  28 ---
 include/soc/tegra/mc.h  |  26 --
 8 files changed, 473 deletions(-)
 delete mode 100644 drivers/iommu/tegra-gart.c

diff --git a/arch/arm/configs/multi_v7_defconfig 
b/arch/arm/configs/multi_v7_defconfig
index 871fffe92187bf..daba1afdbd1100 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -1063,7 +1063,6 @@ CONFIG_BCM2835_MBOX=y
 CONFIG_QCOM_APCS_IPC=y
 CONFIG_QCOM_IPCC=y
 CONFIG_ROCKCHIP_IOMMU=y
-CONFIG_TEGRA_IOMMU_GART=y
 CONFIG_TEGRA_IOMMU_SMMU=y
 CONFIG_EXYNOS_IOMMU=y
 CONFIG_QCOM_IOMMU=y
diff --git a/arch/arm/configs/tegra_defconfig b/arch/arm/configs/tegra_defconfig
index f32047e24b633e..ad31b9322911ce 100644
--- a/arch/arm/configs/tegra_defconfig
+++ b/arch/arm/configs/tegra_defconfig
@@ -293,7 +293,6 @@ CONFIG_CHROME_PLATFORMS=y
 CONFIG_CROS_EC=y
 CONFIG_CROS_EC_I2C=m
 CONFIG_CROS_EC_SPI=m
-CONFIG_TEGRA_IOMMU_GART=y
 CONFIG_TEGRA_IOMMU_SMMU=y
 CONFIG_ARCH_TEGRA_2x_SOC=y
 CONFIG_ARCH_TEGRA_3x_SOC=y
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 4d800601e8ecd6..3309f297bbd822 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -235,17 +235,6 @@ config SUN50I_IOMMU
help
  Support for the IOMMU introduced in the Allwinner H6 SoCs.
 
-config TEGRA_IOMMU_GART
-   bool "Tegra GART IOMMU Support"
-   depends on ARCH_TEGRA_2x_SOC
-   depends on TEGRA_MC
-   select IOMMU_API
-   help
- Enables support for remapping discontiguous physical memory
- shared with the operating system into contiguous I/O virtual
- space through the GART (Graphics Address Relocation Table)
- hardware included on Tegra SoCs.
-
 config TEGRA_IOMMU_SMMU
bool "NVIDIA Tegra SMMU Support"
depends on ARCH_TEGRA
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce89..95ad9dbfbda022 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -20,7 +20,6 @@ obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o
 obj-$(CONFIG_OMAP_IOMMU_DEBUG) += omap-iommu-debug.o
 obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o
 obj-$(CONFIG_SUN50I_IOMMU) += sun50i-iommu.o
-obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o
 obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
deleted file mode 100644
index a482ff838b5331..00
--- a/drivers/iommu/tegra-gart.c
+++ /dev/null
@@ -1,371 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * IOMMU API for Graphics Address Relocation Table on Tegra20
- *
- * Copyright (c) 2010-2012, NVIDIA CORPORATION.  All rights reserved.
- *
- * Author: Hiroshi DOYU 
- */
-
-#define dev_fmt(fmt)   "gart: " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#define GART_REG_BASE  0x24
-#define GART_CONFIG(0x24 - GART_REG_BASE)
-#define GART_ENTRY_ADDR(0x28 - GART_REG_BASE)
-#define GART_ENTRY_DATA(0x2c - GART_REG_BASE)
-
-#define GART_ENTRY_PHYS_ADDR_VALID BIT(31)
-
-#define GART_PAGE_SHIFT12
-#define GART_PAGE_SIZE (1 << GART_PAGE_SHIFT)
-#define GART_PAGE_MASK GENMASK(30, GART_PAGE_SHIFT)
-
-/* bitmap of the page sizes currently supported */
-#define GART_IOMMU_PGSIZES (GART_PAGE_SIZE)
-
-struct gart_device {
-   void __iomem*regs;
-   u32 *savedata;
-   unsigned long   iovmm_ba

[PATCH v3 10/25] iommu/exynos: Implement an IDENTITY domain

2023-06-09 Thread Jason Gunthorpe
What exynos calls exynos_iommu_detach_device is actually putting the iommu
into identity mode.

Move to the new core support for ARM_DMA_USE_IOMMU by defining
ops->identity_domain.

Tested-by: Marek Szyprowski 
Acked-by: Marek Szyprowski 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/exynos-iommu.c | 66 +---
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index c275fe71c4db32..5e12b85dfe8705 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -24,6 +24,7 @@
 
 typedef u32 sysmmu_iova_t;
 typedef u32 sysmmu_pte_t;
+static struct iommu_domain exynos_identity_domain;
 
 /* We do not consider super section mapping (16MB) */
 #define SECT_ORDER 20
@@ -829,7 +830,7 @@ static int __maybe_unused exynos_sysmmu_suspend(struct 
device *dev)
struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(&owner->rpm_lock);
-   if (data->domain) {
+   if (&data->domain->domain != &exynos_identity_domain) {
dev_dbg(data->sysmmu, "saving state\n");
__sysmmu_disable(data);
}
@@ -847,7 +848,7 @@ static int __maybe_unused exynos_sysmmu_resume(struct 
device *dev)
struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
 
mutex_lock(&owner->rpm_lock);
-   if (data->domain) {
+   if (&data->domain->domain != &exynos_identity_domain) {
dev_dbg(data->sysmmu, "restoring state\n");
__sysmmu_enable(data);
}
@@ -980,17 +981,20 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
kfree(domain);
 }
 
-static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain,
-   struct device *dev)
+static int exynos_iommu_identity_attach(struct iommu_domain *identity_domain,
+   struct device *dev)
 {
-   struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
-   phys_addr_t pagetable = virt_to_phys(domain->pgtable);
+   struct exynos_iommu_domain *domain;
+   phys_addr_t pagetable;
struct sysmmu_drvdata *data, *next;
unsigned long flags;
 
-   if (!has_sysmmu(dev) || owner->domain != iommu_domain)
-   return;
+   if (owner->domain == identity_domain)
+   return 0;
+
+   domain = to_exynos_domain(owner->domain);
+   pagetable = virt_to_phys(domain->pgtable);
 
mutex_lock(&owner->rpm_lock);
 
@@ -1009,15 +1013,25 @@ static void exynos_iommu_detach_device(struct 
iommu_domain *iommu_domain,
list_del_init(&data->domain_node);
spin_unlock(&data->lock);
}
-   owner->domain = NULL;
+   owner->domain = identity_domain;
spin_unlock_irqrestore(&domain->lock, flags);
 
mutex_unlock(&owner->rpm_lock);
 
-   dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__,
-   &pagetable);
+   dev_dbg(dev, "%s: Restored IOMMU to IDENTITY from pgtable %pa\n",
+   __func__, &pagetable);
+   return 0;
 }
 
+static struct iommu_domain_ops exynos_identity_ops = {
+   .attach_dev = exynos_iommu_identity_attach,
+};
+
+static struct iommu_domain exynos_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &exynos_identity_ops,
+};
+
 static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain,
   struct device *dev)
 {
@@ -1026,12 +1040,11 @@ static int exynos_iommu_attach_device(struct 
iommu_domain *iommu_domain,
struct sysmmu_drvdata *data;
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
unsigned long flags;
+   int err;
 
-   if (!has_sysmmu(dev))
-   return -ENODEV;
-
-   if (owner->domain)
-   exynos_iommu_detach_device(owner->domain, dev);
+   err = exynos_iommu_identity_attach(&exynos_identity_domain, dev);
+   if (err)
+   return err;
 
mutex_lock(&owner->rpm_lock);
 
@@ -1407,26 +1420,12 @@ static struct iommu_device 
*exynos_iommu_probe_device(struct device *dev)
return &data->iommu;
 }
 
-static void exynos_iommu_set_platform_dma(struct device *dev)
-{
-   struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
-
-   if (owner->domain) {
-   struct iommu_group *group = iommu_group_get(dev);
-
-   if (group) {
-   exynos_iommu_detach_device(owner->domain, dev);
-   iommu_group_put(group);
-   }
-   }
-}
-
 static void exynos_iommu_release_device(struct device *dev)
 {
struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);

[PATCH v3 15/25] iommufd/selftest: Make the mock iommu driver into a real driver

2023-06-09 Thread Jason Gunthorpe
I've avoided doing this because there is no way to make this happen
without an intrusion into the core code. Up till now this has avoided
needing the core code's probe path with some hackery - but now that
default domains are becoming mandatory it is unavoidable. The core probe
path must be run to set the default_domain, only it can do it. Without
a default domain iommufd can't use the group.

Make it so that iommufd selftest can create a real iommu driver and bind
it only to is own private bus. Add iommu_device_register_bus() as a core
code helper to make this possible. It simply sets the right pointers and
registers the notifier block. The mock driver then works like any normal
driver should, with probe triggered by the bus ops

When the bus->iommu_ops stuff is fully unwound we can probably do better
here and remove this special case.

Remove set_platform_dma_ops from selftest and make it use a BLOCKED
default domain.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu-priv.h  |  16 +++
 drivers/iommu/iommu.c   |  43 
 drivers/iommu/iommufd/iommufd_private.h |   5 +-
 drivers/iommu/iommufd/main.c|   8 +-
 drivers/iommu/iommufd/selftest.c| 141 +---
 5 files changed, 144 insertions(+), 69 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
new file mode 100644
index 00..1cbc04b9cf7297
--- /dev/null
+++ b/drivers/iommu/iommu-priv.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef __IOMMU_PRIV_H
+#define __IOMMU_PRIV_H
+
+#include 
+
+int iommu_device_register_bus(struct iommu_device *iommu,
+ const struct iommu_ops *ops, struct bus_type *bus,
+ struct notifier_block *nb);
+void iommu_device_unregister_bus(struct iommu_device *iommu,
+struct bus_type *bus,
+struct notifier_block *nb);
+
+#endif
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7ca70e2a3f51e9..a3a4d004767b4d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -36,6 +36,7 @@
 #include "dma-iommu.h"
 
 #include "iommu-sva.h"
+#include "iommu-priv.h"
 
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
@@ -287,6 +288,48 @@ void iommu_device_unregister(struct iommu_device *iommu)
 }
 EXPORT_SYMBOL_GPL(iommu_device_unregister);
 
+#if IS_ENABLED(CONFIG_IOMMUFD_TEST)
+void iommu_device_unregister_bus(struct iommu_device *iommu,
+struct bus_type *bus,
+struct notifier_block *nb)
+{
+   bus_unregister_notifier(bus, nb);
+   iommu_device_unregister(iommu);
+}
+EXPORT_SYMBOL_GPL(iommu_device_unregister_bus);
+
+/*
+ * Register an iommu driver against a single bus. This is only used by iommufd
+ * selftest to create a mock iommu driver. The caller must provide
+ * some memory to hold a notifier_block.
+ */
+int iommu_device_register_bus(struct iommu_device *iommu,
+ const struct iommu_ops *ops, struct bus_type *bus,
+ struct notifier_block *nb)
+{
+   int err;
+
+   iommu->ops = ops;
+   nb->notifier_call = iommu_bus_notifier;
+   err = bus_register_notifier(bus, nb);
+   if (err)
+   return err;
+
+   spin_lock(&iommu_device_lock);
+   list_add_tail(&iommu->list, &iommu_device_list);
+   spin_unlock(&iommu_device_lock);
+
+   bus->iommu_ops = ops;
+   err = bus_iommu_probe(bus);
+   if (err) {
+   iommu_device_unregister_bus(iommu, bus, nb);
+   return err;
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_device_register_bus);
+#endif
+
 static struct dev_iommu *dev_iommu_get(struct device *dev)
 {
struct dev_iommu *param = dev->iommu;
diff --git a/drivers/iommu/iommufd/iommufd_private.h 
b/drivers/iommu/iommufd/iommufd_private.h
index b38e67d1988bdb..368f66c63a239a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -303,7 +303,7 @@ extern size_t iommufd_test_memory_limit;
 void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
   unsigned int ioas_id, u64 *iova, u32 *flags);
 bool iommufd_should_fail(void);
-void __init iommufd_test_init(void);
+int __init iommufd_test_init(void);
 void iommufd_test_exit(void);
 bool iommufd_selftest_is_mock_dev(struct device *dev);
 #else
@@ -316,8 +316,9 @@ static inline bool iommufd_should_fail(void)
 {
return false;
 }
-static inline void __init iommufd_test_init(void)
+static inline int __init iommufd_test_init(void)
 {
+   return 0;
 }
 static inline void iommufd_test_exit(void)
 {
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a69..0

[PATCH v3 14/25] iommu/msm: Implement an IDENTITY domain

2023-06-09 Thread Jason Gunthorpe
What msm does during omap_iommu_set_platform_dma() is actually putting the
iommu into identity mode.

Move to the new core support for ARM_DMA_USE_IOMMU by defining
ops->identity_domain.

This driver does not support IOMMU_DOMAIN_DMA, however it cannot be
compiled on ARM64 either. Most likely it is fine to support dma-iommu.c

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/msm_iommu.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 79d89bad5132b7..26ed81cfeee897 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -443,15 +443,20 @@ static int msm_iommu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 }
 
-static void msm_iommu_set_platform_dma(struct device *dev)
+static int msm_iommu_identity_attach(struct iommu_domain *identity_domain,
+struct device *dev)
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct msm_priv *priv = to_msm_priv(domain);
+   struct msm_priv *priv;
unsigned long flags;
struct msm_iommu_dev *iommu;
struct msm_iommu_ctx_dev *master;
-   int ret;
+   int ret = 0;
 
+   if (domain == identity_domain || !domain)
+   return 0;
+
+   priv = to_msm_priv(domain);
free_io_pgtable_ops(priv->iop);
 
spin_lock_irqsave(&msm_iommu_lock, flags);
@@ -468,8 +473,18 @@ static void msm_iommu_set_platform_dma(struct device *dev)
}
 fail:
spin_unlock_irqrestore(&msm_iommu_lock, flags);
+   return ret;
 }
 
+static struct iommu_domain_ops msm_iommu_identity_ops = {
+   .attach_dev = msm_iommu_identity_attach,
+};
+
+static struct iommu_domain msm_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &msm_iommu_identity_ops,
+};
+
 static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t pa, size_t pgsize, size_t pgcount,
 int prot, gfp_t gfp, size_t *mapped)
@@ -675,10 +690,10 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 }
 
 static struct iommu_ops msm_iommu_ops = {
+   .identity_domain = &msm_iommu_identity_domain,
.domain_alloc = msm_iommu_domain_alloc,
.probe_device = msm_iommu_probe_device,
.device_group = generic_device_group,
-   .set_platform_dma_ops = msm_iommu_set_platform_dma,
.pgsize_bitmap = MSM_IOMMU_PGSIZES,
.of_xlate = qcom_iommu_of_xlate,
.default_domain_ops = &(const struct iommu_domain_ops) {
-- 
2.40.1



[PATCH v3 00/25] iommu: Make default_domain's mandatory

2023-06-09 Thread Jason Gunthorpe
[ It would be good to get this in linux-next, we have some good test
coverage on the ARM side already, thanks! ]

It has been a long time coming, this series completes the default_domain
transition and makes it so that the core IOMMU code will always have a
non-NULL default_domain for every driver on every
platform. set_platform_dma_ops() turned out to be a bad idea, and so
completely remove it.

This is achieved by changing each driver to either:

1 - Convert the existing (or deleted) ops->detach_dev() into an
op->attach_dev() of an IDENTITY domain.

This is based on the theory that the ARM32 HW is able to function when
the iommu is turned off as so the turned off state is an IDENTITY
translation.

2 - Use a new PLATFORM domain type. This is a hack to accommodate drivers
that we don't really know WTF they do. S390 is legitimately using this
to switch to it's platform dma_ops implementation, which is where the
name comes from.

3 - Do #1 and force the default domain to be IDENTITY, this corrects
the tegra-smmu case where even an ARM64 system would have a NULL
default_domain.

Using this we can apply the rules:

a) ARM_DMA_USE_IOMMU mode always uses either the driver's
   ops->default_domain, ops->def_domain_type(), or an IDENTITY domain.
   All ARM32 drivers provide one of these three options.

b) dma-iommu.c mode uses either the driver's ops->default_domain,
   ops->def_domain_type or the usual DMA API policy logic based on the
   command line/etc to pick IDENTITY/DMA domain types

c) All other arch's (PPC/S390) use ops->default_domain always.

See the patch "Require a default_domain for all iommu drivers" for a
per-driver breakdown.

The conversion broadly teaches a bunch of ARM32 drivers that they can do
IDENTITY domains. There is some educated guessing involved that these are
actual IDENTITY domains. If this turns out to be wrong the driver can be
trivially changed to use a BLOCKING domain type instead. Further, the
domain type only matters for drivers using ARM64's dma-iommu.c mode as it
will select IDENTITY based on the command line and expect IDENTITY to
work. For ARM32 and other arch cases it is purely documentation.

Finally, based on all the analysis in this series, we can purge
IOMMU_DOMAIN_UNMANAGED/DMA constants from most of the drivers. This
greatly simplifies understanding the driver contract to the core
code. IOMMU drivers should not be involved in policy for how the DMA API
works, that should be a core core decision.

The main gain from this work is to remove alot of ARM_DMA_USE_IOMMU
specific code and behaviors from drivers. All that remains in iommu
drivers after this series is the calls to arm_iommu_create_mapping().

This is a step toward removing ARM_DMA_USE_IOMMU.

The IDENTITY domains added to the ARM64 supporting drivers can be tested
by booting in ARM64 mode and enabling CONFIG_IOMMU_DEFAULT_PASSTHROUGH. If
the system still boots then most likely the implementation is an IDENTITY
domain. If not we can trivially change it to BLOCKING or at worst PLATFORM
if there is no detail what is going on in the HW.

I think this is pretty safe for the ARM32 drivers as they don't really
change, the code that was in detach_dev continues to be called in the same
places it was called before.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_all_defdom

v3:
 - FSL is back to a PLATFORM domain, with some fixing so it attach only
   does something when leaving an UNMANAGED domain like it always was
 - Rebase on Joerg's tree, adjust for "alloc_type" change
 - Change the ARM32 untrusted check to a WARN_ON since no ARM32 system
   can currently set trusted
v2: 
https://lore.kernel.org/r/0-v2-8d1dc464eac9+10f-iommu_all_defdom_...@nvidia.com
 - FSL is an IDENTITY domain
 - Delete terga-gart instead of trying to carry it
 - Use the policy determination from iommu_get_default_domain_type() to
   drive the arm_iommu mode
 - Reorganize and introduce new patches to do the above:
* Split the ops->identity_domain to an independent earlier patch
* Remove the UNMANAGED return from def_domain_type in mtk_v1 earlier
  so the new iommu_get_default_domain_type() can work
* Make the driver's def_domain_type have higher policy priority than
  untrusted
* Merge the set_platfom_dma_ops hunk from mtk_v1 along with rockchip
  into the patch that forced IDENTITY on ARM32
 - Revise sun50i to be cleaner and have a non-NULL internal domain
 - Reword logging in exynos
 - Remove the gdev from the group alloc path, instead add a new
   function __iommu_group_domain_alloc() that takes in the group
   and uses the first device. Split this to its own patch
 - New patch to make iommufd's mock selftest into a real driver
 - New patch to fix power's partial iommu driver
v1: 
https://lore.kernel.org/r/0-v1-21cc72fcfb22+a7a-iommu_all_defdom_...@nvidia.com

Jason Gunthorpe (25):
  iommu: Add iommu_ops->identity_domain
  iommu: Add IOMMU_DOMAIN_PLATFORM
  po

[PATCH v3 24/25] iommu: Convert simple drivers with DOMAIN_DMA to domain_alloc_paging()

2023-06-09 Thread Jason Gunthorpe
These drivers are all trivially converted since the function is only
called if the domain type is going to be
IOMMU_DOMAIN_UNMANAGED/DMA.

Tested-by: Heiko Stuebner 
Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c | 6 ++
 drivers/iommu/exynos-iommu.c| 7 ++-
 drivers/iommu/ipmmu-vmsa.c  | 7 ++-
 drivers/iommu/mtk_iommu.c   | 7 ++-
 drivers/iommu/rockchip-iommu.c  | 7 ++-
 drivers/iommu/sprd-iommu.c  | 7 ++-
 drivers/iommu/sun50i-iommu.c| 9 +++--
 drivers/iommu/tegra-smmu.c  | 7 ++-
 8 files changed, 17 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 9d7b9d8b4386d4..a2140fdc65ed58 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -319,12 +319,10 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
return ret;
 }
 
-static struct iommu_domain *qcom_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *qcom_iommu_domain_alloc_paging(struct device *dev)
 {
struct qcom_iommu_domain *qcom_domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
-   return NULL;
/*
 * Allocate the domain and initialise some of its data structures.
 * We can't really do anything meaningful until we've added a
@@ -593,7 +591,7 @@ static int qcom_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
 static const struct iommu_ops qcom_iommu_ops = {
.identity_domain = &qcom_iommu_identity_domain,
.capable= qcom_iommu_capable,
-   .domain_alloc   = qcom_iommu_domain_alloc,
+   .domain_alloc_paging = qcom_iommu_domain_alloc_paging,
.probe_device   = qcom_iommu_probe_device,
.device_group   = generic_device_group,
.of_xlate   = qcom_iommu_of_xlate,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 5e12b85dfe8705..d6dead2ed10c11 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -887,7 +887,7 @@ static inline void exynos_iommu_set_pte(sysmmu_pte_t *ent, 
sysmmu_pte_t val)
   DMA_TO_DEVICE);
 }
 
-static struct iommu_domain *exynos_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device 
*dev)
 {
struct exynos_iommu_domain *domain;
dma_addr_t handle;
@@ -896,9 +896,6 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
/* Check if correct PTE offsets are initialized */
BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev);
 
-   if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
@@ -1472,7 +1469,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops exynos_iommu_ops = {
.identity_domain = &exynos_identity_domain,
-   .domain_alloc = exynos_iommu_domain_alloc,
+   .domain_alloc_paging = exynos_iommu_domain_alloc_paging,
.device_group = generic_device_group,
.probe_device = exynos_iommu_probe_device,
.release_device = exynos_iommu_release_device,
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index de958e411a92e0..27d36347e0fced 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -566,13 +566,10 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
-static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
+static struct iommu_domain *ipmmu_domain_alloc_paging(struct device *dev)
 {
struct ipmmu_vmsa_domain *domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
-   return NULL;
-
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
@@ -891,7 +888,7 @@ static struct iommu_group *ipmmu_find_group(struct device 
*dev)
 
 static const struct iommu_ops ipmmu_ops = {
.identity_domain = &ipmmu_iommu_identity_domain,
-   .domain_alloc = ipmmu_domain_alloc,
+   .domain_alloc_paging = ipmmu_domain_alloc_paging,
.probe_device = ipmmu_probe_device,
.release_device = ipmmu_release_device,
.probe_finalize = ipmmu_probe_finalize,
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index fdb7f5162b1d64..3590d3399add32 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -667,13 +667,10 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom,
return 0;
 }
 
-static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *mtk_iommu_domain_a

[PATCH v3 25/25] iommu: Convert remaining simple drivers to domain_alloc_paging()

2023-06-09 Thread Jason Gunthorpe
These drivers don't support IOMMU_DOMAIN_DMA, so this commit effectively
allows them to support that mode.

The prior work to require default_domains makes this safe because every
one of these drivers is either compilation incompatible with dma-iommu.c,
or already establishing a default_domain. In both cases alloc_domain()
will never be called with IOMMU_DOMAIN_DMA for these drivers so it is safe
to drop the test.

Removing these tests clarifies that the domain allocation path is only
about the functionality of a paging domain and has nothing to do with
policy of how the paging domain is used for UNMANAGED/DMA/DMA_FQ.

Tested-by: Niklas Schnelle 
Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/msm_iommu.c| 7 ++-
 drivers/iommu/mtk_iommu_v1.c | 7 ++-
 drivers/iommu/omap-iommu.c   | 7 ++-
 drivers/iommu/s390-iommu.c   | 7 ++-
 4 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 26ed81cfeee897..a163cee0b7242d 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -302,13 +302,10 @@ static void __program_context(void __iomem *base, int ctx,
SET_M(base, ctx, 1);
 }
 
-static struct iommu_domain *msm_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *msm_iommu_domain_alloc_paging(struct device *dev)
 {
struct msm_priv *priv;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
goto fail_nomem;
@@ -691,7 +688,7 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
 
 static struct iommu_ops msm_iommu_ops = {
.identity_domain = &msm_iommu_identity_domain,
-   .domain_alloc = msm_iommu_domain_alloc,
+   .domain_alloc_paging = msm_iommu_domain_alloc_paging,
.probe_device = msm_iommu_probe_device,
.device_group = generic_device_group,
.pgsize_bitmap = MSM_IOMMU_PGSIZES,
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 7c0c1d50df5f75..67e044c1a7d93b 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -270,13 +270,10 @@ static int mtk_iommu_v1_domain_finalise(struct 
mtk_iommu_v1_data *data)
return 0;
 }
 
-static struct iommu_domain *mtk_iommu_v1_domain_alloc(unsigned type)
+static struct iommu_domain *mtk_iommu_v1_domain_alloc_paging(struct device 
*dev)
 {
struct mtk_iommu_v1_domain *dom;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
dom = kzalloc(sizeof(*dom), GFP_KERNEL);
if (!dom)
return NULL;
@@ -585,7 +582,7 @@ static int mtk_iommu_v1_hw_init(const struct 
mtk_iommu_v1_data *data)
 
 static const struct iommu_ops mtk_iommu_v1_ops = {
.identity_domain = &mtk_iommu_v1_identity_domain,
-   .domain_alloc   = mtk_iommu_v1_domain_alloc,
+   .domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
.probe_device   = mtk_iommu_v1_probe_device,
.probe_finalize = mtk_iommu_v1_probe_finalize,
.release_device = mtk_iommu_v1_release_device,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 34340ef15241bc..fcf99bd195b32e 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1580,13 +1580,10 @@ static struct iommu_domain omap_iommu_identity_domain = 
{
.ops = &omap_iommu_identity_ops,
 };
 
-static struct iommu_domain *omap_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *omap_iommu_domain_alloc_paging(struct device *dev)
 {
struct omap_iommu_domain *omap_domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
omap_domain = kzalloc(sizeof(*omap_domain), GFP_KERNEL);
if (!omap_domain)
return NULL;
@@ -1748,7 +1745,7 @@ static struct iommu_group *omap_iommu_device_group(struct 
device *dev)
 
 static const struct iommu_ops omap_iommu_ops = {
.identity_domain = &omap_iommu_identity_domain,
-   .domain_alloc   = omap_iommu_domain_alloc,
+   .domain_alloc_paging = omap_iommu_domain_alloc_paging,
.probe_device   = omap_iommu_probe_device,
.release_device = omap_iommu_release_device,
.device_group   = omap_iommu_device_group,
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index f0c867c57a5b9b..5695ad71d60e24 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -39,13 +39,10 @@ static bool s390_iommu_capable(struct device *dev, enum 
iommu_cap cap)
}
 }
 
-static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
+static struct iommu_domain *s390_domain_alloc_paging(struct device *dev)
 {
struct s390_domain *s390_domain;
 
-   if (domain_type != IOMMU_DOMAIN_UNMANAGED)
-   return NULL;
-
s390_domain = kzalloc(sizeof(*s39

[PATCH v3 20/25] iommu/sun50i: Add an IOMMU_IDENTITIY_DOMAIN

2023-06-09 Thread Jason Gunthorpe
Prior to commit 1b932ceddd19 ("iommu: Remove detach_dev callbacks") the
sun50i_iommu_detach_device() function was being called by
ops->detach_dev().

This is an IDENTITY domain so convert sun50i_iommu_detach_device() into
sun50i_iommu_identity_attach() and a full IDENTITY domain and thus hook it
back up the same was as the old ops->detach_dev().

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/sun50i-iommu.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 74c5cb93e90027..0bf08b120cf105 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -757,21 +757,32 @@ static void sun50i_iommu_detach_domain(struct 
sun50i_iommu *iommu,
iommu->domain = NULL;
 }
 
-static void sun50i_iommu_detach_device(struct iommu_domain *domain,
-  struct device *dev)
+static int sun50i_iommu_identity_attach(struct iommu_domain *identity_domain,
+   struct device *dev)
 {
-   struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
struct sun50i_iommu *iommu = dev_iommu_priv_get(dev);
+   struct sun50i_iommu_domain *sun50i_domain;
 
dev_dbg(dev, "Detaching from IOMMU domain\n");
 
-   if (iommu->domain != domain)
-   return;
+   if (iommu->domain == identity_domain)
+   return 0;
 
+   sun50i_domain = to_sun50i_domain(iommu->domain);
if (refcount_dec_and_test(&sun50i_domain->refcnt))
sun50i_iommu_detach_domain(iommu, sun50i_domain);
+   return 0;
 }
 
+static struct iommu_domain_ops sun50i_iommu_identity_ops = {
+   .attach_dev = sun50i_iommu_identity_attach,
+};
+
+static struct iommu_domain sun50i_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &sun50i_iommu_identity_ops,
+};
+
 static int sun50i_iommu_attach_device(struct iommu_domain *domain,
  struct device *dev)
 {
@@ -789,8 +800,7 @@ static int sun50i_iommu_attach_device(struct iommu_domain 
*domain,
if (iommu->domain == domain)
return 0;
 
-   if (iommu->domain)
-   sun50i_iommu_detach_device(iommu->domain, dev);
+   sun50i_iommu_identity_attach(&sun50i_iommu_identity_domain, dev);
 
sun50i_iommu_attach_domain(iommu, sun50i_domain);
 
@@ -827,6 +837,7 @@ static int sun50i_iommu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops sun50i_iommu_ops = {
+   .identity_domain = &sun50i_iommu_identity_domain,
.pgsize_bitmap  = SZ_4K,
.device_group   = sun50i_iommu_device_group,
.domain_alloc   = sun50i_iommu_domain_alloc,
@@ -985,6 +996,7 @@ static int sun50i_iommu_probe(struct platform_device *pdev)
if (!iommu)
return -ENOMEM;
spin_lock_init(&iommu->iommu_lock);
+   iommu->domain = &sun50i_iommu_identity_domain;
platform_set_drvdata(pdev, iommu);
iommu->dev = &pdev->dev;
 
-- 
2.40.1



[PATCH v3 16/25] iommu: Remove ops->set_platform_dma_ops()

2023-06-09 Thread Jason Gunthorpe
All drivers are now using IDENTITY or PLATFORM domains for what this did,
we can remove it now. It is no longer possible to attach to a NULL domain.

Tested-by: Heiko Stuebner 
Tested-by: Niklas Schnelle 
Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 30 +-
 include/linux/iommu.h |  4 
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a3a4d004767b4d..e60640f6ccb625 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2250,21 +2250,8 @@ static int __iommu_group_set_domain_internal(struct 
iommu_group *group,
if (group->domain == new_domain)
return 0;
 
-   /*
-* New drivers should support default domains, so set_platform_dma()
-* op will never be called. Otherwise the NULL domain represents some
-* platform specific behavior.
-*/
-   if (!new_domain) {
-   for_each_group_device(group, gdev) {
-   const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
-
-   if (!WARN_ON(!ops->set_platform_dma_ops))
-   ops->set_platform_dma_ops(gdev->dev);
-   }
-   group->domain = NULL;
-   return 0;
-   }
+   if (WARN_ON(!new_domain))
+   return -EINVAL;
 
/*
 * Changing the domain is done by calling attach_dev() on the new
@@ -2300,19 +2287,15 @@ static int __iommu_group_set_domain_internal(struct 
iommu_group *group,
 */
last_gdev = gdev;
for_each_group_device(group, gdev) {
-   const struct iommu_ops *ops = dev_iommu_ops(gdev->dev);
-
/*
-* If set_platform_dma_ops is not present a NULL domain can
-* happen only for first probe, in which case we leave
-* group->domain as NULL and let release clean everything up.
+* A NULL domain can happen only for first probe, in which case
+* we leave group->domain as NULL and let release clean
+* everything up.
 */
if (group->domain)
WARN_ON(__iommu_device_set_domain(
group, gdev->dev, group->domain,
IOMMU_SET_DOMAIN_MUST_SUCCEED));
-   else if (ops->set_platform_dma_ops)
-   ops->set_platform_dma_ops(gdev->dev);
if (gdev == last_gdev)
break;
}
@@ -2926,9 +2909,6 @@ static int iommu_setup_default_domain(struct iommu_group 
*group,
/*
 * There are still some drivers which don't support default domains, so
 * we ignore the failure and leave group->default_domain NULL.
-*
-* We assume that the iommu driver starts up the device in
-* 'set_platform_dma_ops' mode if it does not support default domains.
 */
dom = iommu_group_alloc_default_domain(group, req_type);
if (!dom) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ef0af09326..49331573f1d1f5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -237,9 +237,6 @@ struct iommu_iotlb_gather {
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
  *  group and attached to the groups domain
- * @set_platform_dma_ops: Returning control back to the platform DMA ops. This 
op
- *is to support old IOMMU drivers, new drivers should 
use
- *default domains, and the common IOMMU DMA ops.
  * @device_group: find iommu group for a particular device
  * @get_resv_regions: Request list of reserved regions for a device
  * @of_xlate: add OF master IDs to iommu grouping
@@ -271,7 +268,6 @@ struct iommu_ops {
struct iommu_device *(*probe_device)(struct device *dev);
void (*release_device)(struct device *dev);
void (*probe_finalize)(struct device *dev);
-   void (*set_platform_dma_ops)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
 
/* Request/Free a list of reserved regions for a device */
-- 
2.40.1



[PATCH v3 17/25] iommu/qcom_iommu: Add an IOMMU_IDENTITIY_DOMAIN

2023-06-09 Thread Jason Gunthorpe
This brings back the ops->detach_dev() code that commit
1b932ceddd19 ("iommu: Remove detach_dev callbacks") deleted and turns it
into an IDENTITY domain.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c | 39 +
 1 file changed, 39 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index a503ed758ec302..9d7b9d8b4386d4 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -387,6 +387,44 @@ static int qcom_iommu_attach_dev(struct iommu_domain 
*domain, struct device *dev
return 0;
 }
 
+static int qcom_iommu_identity_attach(struct iommu_domain *identity_domain,
+ struct device *dev)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct qcom_iommu_domain *qcom_domain;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+   unsigned int i;
+
+   if (domain == identity_domain || !domain)
+   return 0;
+
+   qcom_domain = to_qcom_iommu_domain(domain);
+   if (WARN_ON(!qcom_domain->iommu))
+   return -EINVAL;
+
+   pm_runtime_get_sync(qcom_iommu->dev);
+   for (i = 0; i < fwspec->num_ids; i++) {
+   struct qcom_iommu_ctx *ctx = to_ctx(qcom_domain, 
fwspec->ids[i]);
+
+   /* Disable the context bank: */
+   iommu_writel(ctx, ARM_SMMU_CB_SCTLR, 0);
+
+   ctx->domain = NULL;
+   }
+   pm_runtime_put_sync(qcom_iommu->dev);
+   return 0;
+}
+
+static struct iommu_domain_ops qcom_iommu_identity_ops = {
+   .attach_dev = qcom_iommu_identity_attach,
+};
+
+static struct iommu_domain qcom_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &qcom_iommu_identity_ops,
+};
+
 static int qcom_iommu_map(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t pgsize, size_t pgcount,
  int prot, gfp_t gfp, size_t *mapped)
@@ -553,6 +591,7 @@ static int qcom_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
 }
 
 static const struct iommu_ops qcom_iommu_ops = {
+   .identity_domain = &qcom_iommu_identity_domain,
.capable= qcom_iommu_capable,
.domain_alloc   = qcom_iommu_domain_alloc,
.probe_device   = qcom_iommu_probe_device,
-- 
2.40.1



[PATCH v3 18/25] iommu/ipmmu: Add an IOMMU_IDENTITIY_DOMAIN

2023-06-09 Thread Jason Gunthorpe
This brings back the ops->detach_dev() code that commit
1b932ceddd19 ("iommu: Remove detach_dev callbacks") deleted and turns it
into an IDENTITY domain.

Also reverts commit 584d334b1393 ("iommu/ipmmu-vmsa: Remove
ipmmu_utlb_disable()")

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/ipmmu-vmsa.c | 43 ++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9f64c5c9f5b90a..de958e411a92e0 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -298,6 +298,18 @@ static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain 
*domain,
mmu->utlb_ctx[utlb] = domain->context_id;
 }
 
+/*
+ * Disable MMU translation for the microTLB.
+ */
+static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain,
+  unsigned int utlb)
+{
+   struct ipmmu_vmsa_device *mmu = domain->mmu;
+
+   ipmmu_imuctr_write(mmu, utlb, 0);
+   mmu->utlb_ctx[utlb] = IPMMU_CTX_INVALID;
+}
+
 static void ipmmu_tlb_flush_all(void *cookie)
 {
struct ipmmu_vmsa_domain *domain = cookie;
@@ -630,6 +642,36 @@ static int ipmmu_attach_device(struct iommu_domain 
*io_domain,
return 0;
 }
 
+static int ipmmu_iommu_identity_attach(struct iommu_domain *identity_domain,
+  struct device *dev)
+{
+   struct iommu_domain *io_domain = iommu_get_domain_for_dev(dev);
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct ipmmu_vmsa_domain *domain;
+   unsigned int i;
+
+   if (io_domain == identity_domain || !io_domain)
+   return 0;
+
+   domain = to_vmsa_domain(io_domain);
+   for (i = 0; i < fwspec->num_ids; ++i)
+   ipmmu_utlb_disable(domain, fwspec->ids[i]);
+
+   /*
+* TODO: Optimize by disabling the context when no device is attached.
+*/
+   return 0;
+}
+
+static struct iommu_domain_ops ipmmu_iommu_identity_ops = {
+   .attach_dev = ipmmu_iommu_identity_attach,
+};
+
+static struct iommu_domain ipmmu_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &ipmmu_iommu_identity_ops,
+};
+
 static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
 phys_addr_t paddr, size_t pgsize, size_t pgcount,
 int prot, gfp_t gfp, size_t *mapped)
@@ -848,6 +890,7 @@ static struct iommu_group *ipmmu_find_group(struct device 
*dev)
 }
 
 static const struct iommu_ops ipmmu_ops = {
+   .identity_domain = &ipmmu_iommu_identity_domain,
.domain_alloc = ipmmu_domain_alloc,
.probe_device = ipmmu_probe_device,
.release_device = ipmmu_release_device,
-- 
2.40.1



[PATCH v3 19/25] iommu/mtk_iommu: Add an IOMMU_IDENTITIY_DOMAIN

2023-06-09 Thread Jason Gunthorpe
This brings back the ops->detach_dev() code that commit
1b932ceddd19 ("iommu: Remove detach_dev callbacks") deleted and turns it
into an IDENTITY domain.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/mtk_iommu.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index e93906d6e112e8..fdb7f5162b1d64 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -753,6 +753,28 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
return ret;
 }
 
+static int mtk_iommu_identity_attach(struct iommu_domain *identity_domain,
+struct device *dev)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
+
+   if (domain == identity_domain || !domain)
+   return 0;
+
+   mtk_iommu_config(data, dev, false, 0);
+   return 0;
+}
+
+static struct iommu_domain_ops mtk_iommu_identity_ops = {
+   .attach_dev = mtk_iommu_identity_attach,
+};
+
+static struct iommu_domain mtk_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &mtk_iommu_identity_ops,
+};
+
 static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t pgsize, size_t pgcount,
 int prot, gfp_t gfp, size_t *mapped)
@@ -972,6 +994,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
 }
 
 static const struct iommu_ops mtk_iommu_ops = {
+   .identity_domain = &mtk_iommu_identity_domain,
.domain_alloc   = mtk_iommu_domain_alloc,
.probe_device   = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
-- 
2.40.1



[PATCH v3 04/25] iommu: Add IOMMU_DOMAIN_PLATFORM for S390

2023-06-09 Thread Jason Gunthorpe
The PLATFORM domain will be set as the default domain and attached as
normal during probe. The driver will ignore the initial attach from a NULL
domain to the PLATFORM domain.

After this, the PLATFORM domain's attach_dev will be called whenever we
detach from an UNMANAGED domain (eg for VFIO). This is the same time the
original design would have called op->detach_dev().

This is temporary until the S390 dma-iommu.c conversion is merged.

Tested-by: Heiko Stuebner 
Tested-by: Niklas Schnelle 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/s390-iommu.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fbf59a8db29b11..f0c867c57a5b9b 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -142,14 +142,31 @@ static int s390_iommu_attach_device(struct iommu_domain 
*domain,
return 0;
 }
 
-static void s390_iommu_set_platform_dma(struct device *dev)
+/*
+ * Switch control over the IOMMU to S390's internal dma_api ops
+ */
+static int s390_iommu_platform_attach(struct iommu_domain *platform_domain,
+ struct device *dev)
 {
struct zpci_dev *zdev = to_zpci_dev(dev);
 
+   if (!zdev->s390_domain)
+   return 0;
+
__s390_iommu_detach_device(zdev);
zpci_dma_init_device(zdev);
+   return 0;
 }
 
+static struct iommu_domain_ops s390_iommu_platform_ops = {
+   .attach_dev = s390_iommu_platform_attach,
+};
+
+static struct iommu_domain s390_iommu_platform_domain = {
+   .type = IOMMU_DOMAIN_PLATFORM,
+   .ops = &s390_iommu_platform_ops,
+};
+
 static void s390_iommu_get_resv_regions(struct device *dev,
struct list_head *list)
 {
@@ -428,12 +445,12 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
 }
 
 static const struct iommu_ops s390_iommu_ops = {
+   .default_domain = &s390_iommu_platform_domain,
.capable = s390_iommu_capable,
.domain_alloc = s390_domain_alloc,
.probe_device = s390_iommu_probe_device,
.release_device = s390_iommu_release_device,
.device_group = generic_device_group,
-   .set_platform_dma_ops = s390_iommu_set_platform_dma,
.pgsize_bitmap = SZ_4K,
.get_resv_regions = s390_iommu_get_resv_regions,
.default_domain_ops = &(const struct iommu_domain_ops) {
-- 
2.40.1



[PATCH v3 03/25] powerpc/iommu: Setup a default domain and remove set_platform_dma_ops

2023-06-09 Thread Jason Gunthorpe
POWER is using the set_platform_dma_ops() callback to hook up its private
dma_ops, but this is buired under some indirection and is weirdly
happening for a BLOCKED domain as well.

For better documentation create a PLATFORM domain to manage the dma_ops,
since that is what it is for, and make the BLOCKED domain an alias for
it. BLOCKED is required for VFIO.

Also removes the leaky allocation of the BLOCKED domain by using a global
static.

Signed-off-by: Jason Gunthorpe 
---
 arch/powerpc/kernel/iommu.c | 38 +
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 67f0b01e6ff575..0f17cd767e1676 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1266,7 +1266,7 @@ struct iommu_table_group_ops spapr_tce_table_group_ops = {
 /*
  * A simple iommu_ops to allow less cruft in generic VFIO code.
  */
-static int spapr_tce_blocking_iommu_attach_dev(struct iommu_domain *dom,
+static int spapr_tce_platform_iommu_attach_dev(struct iommu_domain *dom,
   struct device *dev)
 {
struct iommu_group *grp = iommu_group_get(dev);
@@ -1283,17 +1283,22 @@ static int spapr_tce_blocking_iommu_attach_dev(struct 
iommu_domain *dom,
return ret;
 }
 
-static void spapr_tce_blocking_iommu_set_platform_dma(struct device *dev)
-{
-   struct iommu_group *grp = iommu_group_get(dev);
-   struct iommu_table_group *table_group;
+static const struct iommu_domain_ops spapr_tce_platform_domain_ops = {
+   .attach_dev = spapr_tce_platform_iommu_attach_dev,
+};
 
-   table_group = iommu_group_get_iommudata(grp);
-   table_group->ops->release_ownership(table_group);
-}
+static struct iommu_domain spapr_tce_platform_domain = {
+   .type = IOMMU_DOMAIN_PLATFORM,
+   .ops = &spapr_tce_platform_domain_ops,
+};
 
-static const struct iommu_domain_ops spapr_tce_blocking_domain_ops = {
-   .attach_dev = spapr_tce_blocking_iommu_attach_dev,
+static struct iommu_domain spapr_tce_blocked_domain = {
+   .type = IOMMU_DOMAIN_BLOCKED,
+   /*
+* FIXME: SPAPR mixes blocked and platform behaviors, the blocked domain
+* also sets the dma_api ops
+*/
+   .ops = &spapr_tce_platform_domain_ops,
 };
 
 static bool spapr_tce_iommu_capable(struct device *dev, enum iommu_cap cap)
@@ -1310,18 +1315,9 @@ static bool spapr_tce_iommu_capable(struct device *dev, 
enum iommu_cap cap)
 
 static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
 {
-   struct iommu_domain *dom;
-
if (type != IOMMU_DOMAIN_BLOCKED)
return NULL;
-
-   dom = kzalloc(sizeof(*dom), GFP_KERNEL);
-   if (!dom)
-   return NULL;
-
-   dom->ops = &spapr_tce_blocking_domain_ops;
-
-   return dom;
+   return &spapr_tce_blocked_domain;
 }
 
 static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
@@ -1357,12 +1353,12 @@ static struct iommu_group 
*spapr_tce_iommu_device_group(struct device *dev)
 }
 
 static const struct iommu_ops spapr_tce_iommu_ops = {
+   .default_domain = &spapr_tce_platform_domain,
.capable = spapr_tce_iommu_capable,
.domain_alloc = spapr_tce_iommu_domain_alloc,
.probe_device = spapr_tce_iommu_probe_device,
.release_device = spapr_tce_iommu_release_device,
.device_group = spapr_tce_iommu_device_group,
-   .set_platform_dma_ops = spapr_tce_blocking_iommu_set_platform_dma,
 };
 
 static struct attribute *spapr_tce_iommu_attrs[] = {
-- 
2.40.1



[PATCH v3 22/25] iommu: Add __iommu_group_domain_alloc()

2023-06-09 Thread Jason Gunthorpe
Allocate a domain from a group. Automatically obtains the iommu_ops to use
from the device list of the group. Convert the internal callers to use it.

Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 66 ---
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 98b855487cf03c..0346c05e108438 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -94,8 +94,8 @@ static const char * const iommu_group_resv_type_string[] = {
 static int iommu_bus_notifier(struct notifier_block *nb,
  unsigned long action, void *data);
 static void iommu_release_device(struct device *dev);
-static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus,
-unsigned type);
+static struct iommu_domain *
+__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type);
 static int __iommu_attach_device(struct iommu_domain *domain,
 struct device *dev);
 static int __iommu_attach_group(struct iommu_domain *domain,
@@ -1652,12 +1652,11 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
 static struct iommu_domain *
-__iommu_group_alloc_default_domain(const struct bus_type *bus,
-  struct iommu_group *group, int req_type)
+__iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
 {
if (group->default_domain && group->default_domain->type == req_type)
return group->default_domain;
-   return __iommu_domain_alloc(bus, req_type);
+   return __iommu_group_domain_alloc(group, req_type);
 }
 
 /*
@@ -1667,9 +1666,10 @@ __iommu_group_alloc_default_domain(const struct bus_type 
*bus,
 static struct iommu_domain *
 iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
 {
-   const struct bus_type *bus =
+   struct device *dev =
list_first_entry(&group->devices, struct group_device, list)
-   ->dev->bus;
+   ->dev;
+   const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *dom;
 
lockdep_assert_held(&group->mutex);
@@ -1679,24 +1679,24 @@ iommu_group_alloc_default_domain(struct iommu_group 
*group, int req_type)
 * domain. This should always be either an IDENTITY or PLATFORM domain.
 * Do not use in new drivers.
 */
-   if (bus->iommu_ops->default_domain) {
+   if (ops->default_domain) {
if (req_type)
return ERR_PTR(-EINVAL);
-   return bus->iommu_ops->default_domain;
+   return ops->default_domain;
}
 
if (req_type)
-   return __iommu_group_alloc_default_domain(bus, group, req_type);
+   return __iommu_group_alloc_default_domain(group, req_type);
 
/* The driver gave no guidance on what type to use, try the default */
-   dom = __iommu_group_alloc_default_domain(bus, group, 
iommu_def_domain_type);
+   dom = __iommu_group_alloc_default_domain(group, iommu_def_domain_type);
if (dom)
return dom;
 
/* Otherwise IDENTITY and DMA_FQ defaults will try DMA */
if (iommu_def_domain_type == IOMMU_DOMAIN_DMA)
return NULL;
-   dom = __iommu_group_alloc_default_domain(bus, group, IOMMU_DOMAIN_DMA);
+   dom = __iommu_group_alloc_default_domain(group, IOMMU_DOMAIN_DMA);
if (!dom)
return NULL;
 
@@ -1984,19 +1984,16 @@ void iommu_set_fault_handler(struct iommu_domain 
*domain,
 }
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
-static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus,
-unsigned type)
+static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
+unsigned int type)
 {
struct iommu_domain *domain;
unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS;
 
-   if (bus == NULL || bus->iommu_ops == NULL)
-   return NULL;
+   if (alloc_type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain)
+   return ops->identity_domain;
 
-   if (alloc_type == IOMMU_DOMAIN_IDENTITY && 
bus->iommu_ops->identity_domain)
-   return bus->iommu_ops->identity_domain;
-
-   domain = bus->iommu_ops->domain_alloc(alloc_type);
+   domain = ops->domain_alloc(alloc_type);
if (!domain)
return NULL;
 
@@ -2006,10 +2003,10 @@ static struct iommu_domain *__iommu_domain_alloc(const 
struct bus_type *bus,
 * may override this later
 */
if (!domain->pgsize_bitmap)
-   domain->pgsize_bitmap = bus->iommu_ops-

[PATCH v3 23/25] iommu: Add ops->domain_alloc_paging()

2023-06-09 Thread Jason Gunthorpe
This callback requests the driver to create only a __IOMMU_DOMAIN_PAGING
domain, so it saves a few lines in a lot of drivers needlessly checking
the type.

More critically, this allows us to sweep out all the
IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA checks from a lot of the
drivers, simplifying what is going on in the code and ultimately removing
the now-unused special cases in drivers where they did not support
IOMMU_DOMAIN_DMA.

domain_alloc_paging() should return a struct iommu_domain that is
functionally compatible with ARM_DMA_USE_IOMMU, dma-iommu.c and iommufd.

Be forwards looking and pass in a 'struct device *' argument. We can
provide this when allocating the default_domain. No drivers will look at
this.

Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 13 ++---
 include/linux/iommu.h |  3 +++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0346c05e108438..2cf523ff9c6f55 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1985,6 +1985,7 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
 static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
+struct device *dev,
 unsigned int type)
 {
struct iommu_domain *domain;
@@ -1992,8 +1993,13 @@ static struct iommu_domain *__iommu_domain_alloc(const 
struct iommu_ops *ops,
 
if (alloc_type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain)
return ops->identity_domain;
+   else if (type & __IOMMU_DOMAIN_PAGING) {
+   domain = ops->domain_alloc_paging(dev);
+   } else if (ops->domain_alloc)
+   domain = ops->domain_alloc(alloc_type);
+   else
+   return NULL;
 
-   domain = ops->domain_alloc(alloc_type);
if (!domain)
return NULL;
 
@@ -2024,14 +2030,15 @@ __iommu_group_domain_alloc(struct iommu_group *group, 
unsigned int type)
 
lockdep_assert_held(&group->mutex);
 
-   return __iommu_domain_alloc(dev_iommu_ops(dev), type);
+   return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type);
 }
 
 struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
 {
if (bus == NULL || bus->iommu_ops == NULL)
return NULL;
-   return __iommu_domain_alloc(bus->iommu_ops, IOMMU_DOMAIN_UNMANAGED);
+   return __iommu_domain_alloc(bus->iommu_ops, NULL,
+   IOMMU_DOMAIN_UNMANAGED);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 49331573f1d1f5..8e4d178c49c417 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -233,6 +233,8 @@ struct iommu_iotlb_gather {
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
  * @domain_alloc: allocate iommu domain
+ * @domain_alloc_paging: Allocate an iommu_domain that can be used for
+ *   UNMANAGED, DMA, and DMA_FQ domain types.
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -264,6 +266,7 @@ struct iommu_ops {
 
/* Domain allocation and freeing by the iommu driver */
struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+   struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
 
struct iommu_device *(*probe_device)(struct device *dev);
void (*release_device)(struct device *dev);
-- 
2.40.1



[PATCH v3 07/25] iommu/mtk_iommu_v1: Implement an IDENTITY domain

2023-06-09 Thread Jason Gunthorpe
What mtk does during mtk_iommu_v1_set_platform_dma() is actually putting
the iommu into identity mode. Make this available as a proper IDENTITY
domain.

The mtk_iommu_v1_def_domain_type() from
commit 8bbe13f52cb7 ("iommu/mediatek-v1: Add def_domain_type") explains
this was needed to allow probe_finalize() to be called, but now the
IDENTITY domain will do the same job so change the returned
def_domain_type.

mkt_v1 is the only driver that returns IOMMU_DOMAIN_UNMANAGED from
def_domain_type().  This allows the next patch to enforce an IDENTITY
domain policy for this driver.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/mtk_iommu_v1.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 8a0a5e5d049f4a..cc3e7d53d33ad9 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -319,11 +319,27 @@ static int mtk_iommu_v1_attach_device(struct iommu_domain 
*domain, struct device
return 0;
 }
 
-static void mtk_iommu_v1_set_platform_dma(struct device *dev)
+static int mtk_iommu_v1_identity_attach(struct iommu_domain *identity_domain,
+   struct device *dev)
 {
struct mtk_iommu_v1_data *data = dev_iommu_priv_get(dev);
 
mtk_iommu_v1_config(data, dev, false);
+   return 0;
+}
+
+static struct iommu_domain_ops mtk_iommu_v1_identity_ops = {
+   .attach_dev = mtk_iommu_v1_identity_attach,
+};
+
+static struct iommu_domain mtk_iommu_v1_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &mtk_iommu_v1_identity_ops,
+};
+
+static void mtk_iommu_v1_set_platform_dma(struct device *dev)
+{
+   mtk_iommu_v1_identity_attach(&mtk_iommu_v1_identity_domain, dev);
 }
 
 static int mtk_iommu_v1_map(struct iommu_domain *domain, unsigned long iova,
@@ -443,7 +459,7 @@ static int mtk_iommu_v1_create_mapping(struct device *dev, 
struct of_phandle_arg
 
 static int mtk_iommu_v1_def_domain_type(struct device *dev)
 {
-   return IOMMU_DOMAIN_UNMANAGED;
+   return IOMMU_DOMAIN_IDENTITY;
 }
 
 static struct iommu_device *mtk_iommu_v1_probe_device(struct device *dev)
@@ -578,6 +594,7 @@ static int mtk_iommu_v1_hw_init(const struct 
mtk_iommu_v1_data *data)
 }
 
 static const struct iommu_ops mtk_iommu_v1_ops = {
+   .identity_domain = &mtk_iommu_v1_identity_domain,
.domain_alloc   = mtk_iommu_v1_domain_alloc,
.probe_device   = mtk_iommu_v1_probe_device,
.probe_finalize = mtk_iommu_v1_probe_finalize,
-- 
2.40.1



[PATCH v3 05/25] iommu/fsl_pamu: Implement a PLATFORM domain

2023-06-09 Thread Jason Gunthorpe
This driver is nonsensical. To not block migrating the core API away from
NULL default_domains give it a hacky of a PLATFORM domain that keeps it
working exactly as it always did.

Leave some comments around to warn away any future people looking at this.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/fsl_pamu_domain.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 4ac0e247ec2b51..e9d2bff4659b7c 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -196,6 +196,13 @@ static struct iommu_domain *fsl_pamu_domain_alloc(unsigned 
type)
 {
struct fsl_dma_domain *dma_domain;
 
+   /*
+* FIXME: This isn't creating an unmanaged domain since the
+* default_domain_ops do not have any map/unmap function it doesn't meet
+* the requirements for __IOMMU_DOMAIN_PAGING. The only purpose seems to
+* allow drivers/soc/fsl/qbman/qman_portal.c to do
+* fsl_pamu_configure_l1_stash()
+*/
if (type != IOMMU_DOMAIN_UNMANAGED)
return NULL;
 
@@ -283,15 +290,33 @@ static int fsl_pamu_attach_device(struct iommu_domain 
*domain,
return ret;
 }
 
-static void fsl_pamu_set_platform_dma(struct device *dev)
+/*
+ * FIXME: fsl/pamu is completely broken in terms of how it works with the iommu
+ * API. Immediately after probe the HW is left in an IDENTITY translation and
+ * the driver provides a non-working UNMANAGED domain that it can switch over
+ * to. However it cannot switch back to an IDENTITY translation, instead it
+ * switches to what looks like BLOCKING.
+ */
+static int fsl_pamu_platform_attach(struct iommu_domain *platform_domain,
+   struct device *dev)
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
+   struct fsl_dma_domain *dma_domain;
const u32 *prop;
int len;
struct pci_dev *pdev = NULL;
struct pci_controller *pci_ctl;
 
+   /*
+* Hack to keep things working as they always have, only leaving an
+* UNMANAGED domain makes it BLOCKING.
+*/
+   if (domain == platform_domain || !domain ||
+   domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return 0;
+
+   dma_domain = to_fsl_dma_domain(domain);
+
/*
 * Use LIODN of the PCI controller while detaching a
 * PCI device.
@@ -312,8 +337,18 @@ static void fsl_pamu_set_platform_dma(struct device *dev)
detach_device(dev, dma_domain);
else
pr_debug("missing fsl,liodn property at %pOF\n", dev->of_node);
+   return 0;
 }
 
+static struct iommu_domain_ops fsl_pamu_platform_ops = {
+   .attach_dev = fsl_pamu_platform_attach,
+};
+
+static struct iommu_domain fsl_pamu_platform_domain = {
+   .type = IOMMU_DOMAIN_PLATFORM,
+   .ops = &fsl_pamu_platform_ops,
+};
+
 /* Set the domain stash attribute */
 int fsl_pamu_configure_l1_stash(struct iommu_domain *domain, u32 cpu)
 {
@@ -395,11 +430,11 @@ static struct iommu_device *fsl_pamu_probe_device(struct 
device *dev)
 }
 
 static const struct iommu_ops fsl_pamu_ops = {
+   .default_domain = &fsl_pamu_platform_domain,
.capable= fsl_pamu_capable,
.domain_alloc   = fsl_pamu_domain_alloc,
.probe_device   = fsl_pamu_probe_device,
.device_group   = fsl_pamu_device_group,
-   .set_platform_dma_ops = fsl_pamu_set_platform_dma,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = fsl_pamu_attach_device,
.iova_to_phys   = fsl_pamu_iova_to_phys,
-- 
2.40.1



[PATCH v3 09/25] iommu: Allow an IDENTITY domain as the default_domain in ARM32

2023-06-09 Thread Jason Gunthorpe
Even though dma-iommu.c and CONFIG_ARM_DMA_USE_IOMMU do approximately the
same stuff, the way they relate to the IOMMU core is quiet different.

dma-iommu.c expects the core code to setup an UNMANAGED domain (of type
IOMMU_DOMAIN_DMA) and then configures itself to use that domain. This
becomes the default_domain for the group.

ARM_DMA_USE_IOMMU does not use the default_domain, instead it directly
allocates an UNMANAGED domain and operates it just like an external
driver. In this case group->default_domain is NULL.

If the driver provides a global static identity_domain then automatically
use it as the default_domain when in ARM_DMA_USE_IOMMU mode.

This allows drivers that implemented default_domain == NULL as an IDENTITY
translation to trivially get a properly labeled non-NULL default_domain on
ARM32 configs.

With this arrangment when ARM_DMA_USE_IOMMU wants to disconnect from the
device the normal detach_domain flow will restore the IDENTITY domain as
the default domain. Overall this makes attach_dev() of the IDENTITY domain
called in the same places as detach_dev().

This effectively migrates these drivers to default_domain mode. For
drivers that support ARM64 they will gain support for the IDENTITY
translation mode for the dma_api and behave in a uniform way.

Drivers use this by setting ops->identity_domain to a static singleton
iommu_domain that implements the identity attach. If the core detects
ARM_DMA_USE_IOMMU mode then it automatically attaches the IDENTITY domain
during probe.

Drivers can continue to prevent the use of DMA translation by returning
IOMMU_DOMAIN_IDENTITY from def_domain_type, this will completely prevent
IOMMU_DMA from running but will not impact ARM_DMA_USE_IOMMU.

This allows removing the set_platform_dma_ops() from every remaining
driver.

Remove the set_platform_dma_ops from rockchip and mkt_v1 as all it does
is set an existing global static identity domain. mkt_v1 does not support
IOMMU_DOMAIN_DMA and it does not compile on ARM64 so this transformation
is safe.

Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c  | 26 +++---
 drivers/iommu/mtk_iommu_v1.c   | 12 
 drivers/iommu/rockchip-iommu.c | 10 --
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c4fc46c210366..7ca70e2a3f51e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1757,15 +1757,35 @@ static int iommu_get_default_domain_type(struct 
iommu_group *group,
int type;
 
lockdep_assert_held(&group->mutex);
+
+   /*
+* ARM32 drivers supporting CONFIG_ARM_DMA_USE_IOMMU can declare an
+* identity_domain and it will automatically become their default
+* domain. Later on ARM_DMA_USE_IOMMU will install its UNMANAGED domain.
+* Override the selection to IDENTITY if we are sure the driver supports
+* it.
+*/
+   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) && ops->identity_domain) {
+   type = IOMMU_DOMAIN_IDENTITY;
+   if (best_type && type && best_type != type)
+   goto err;
+   best_type = target_type = IOMMU_DOMAIN_IDENTITY;
+   }
+
for_each_group_device(group, gdev) {
type = best_type;
if (ops->def_domain_type) {
type = ops->def_domain_type(gdev->dev);
-   if (best_type && type && best_type != type)
+   if (best_type && type && best_type != type) {
+   /* Stick with the last driver override we saw */
+   best_type = type;
goto err;
+   }
}
 
-   if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted) {
+   /* No ARM32 using systems will set untrusted, it cannot work. */
+   if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted &&
+   !WARN_ON(IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU))) {
type = IOMMU_DOMAIN_DMA;
if (best_type && type && best_type != type)
goto err;
@@ -1790,7 +1810,7 @@ static int iommu_get_default_domain_type(struct 
iommu_group *group,
"Device needs domain type %s, but device %s in the same iommu 
group requires type %s - using default\n",
iommu_domain_type_str(type), dev_name(last_dev),
iommu_domain_type_str(best_type));
-   return 0;
+   return best_type;
 }
 
 static void iommu_group_do_probe_finalize(struct device *dev)
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index cc3e7d53d33ad9..7c0c1d50df5f75 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -337,11 +337,6 @@ static stru

[PATCH v3 21/25] iommu: Require a default_domain for all iommu drivers

2023-06-09 Thread Jason Gunthorpe
At this point every iommu driver will cause a default_domain to be
selected, so we can finally remove this gap from the core code.

The following table explains what each driver supports and what the
resulting default_domain will be:

ops->defaut_domain
IDENTITY   DMA  PLATFORMv  ARM32  dma-iommu 
 ARCH
amd/iommu.c Y   Y   N/A either
apple-dart.cY   Y   N/A either
arm-smmu.c  Y   Y   IDENTITYeither
qcom_iommu.cG   Y   IDENTITYeither
arm-smmu-v3.c   Y   Y   N/A either
exynos-iommu.c  G   Y   IDENTITYeither
fsl_pamu_domain.c   Y   N/A N/A 
PLATFORM
intel/iommu.c   Y   Y   N/A either
ipmmu-vmsa.cG   Y   IDENTITYeither
msm_iommu.c G   IDENTITYN/A
mtk_iommu.c G   Y   IDENTITYeither
mtk_iommu_v1.c  G   IDENTITYN/A
omap-iommu.cG   IDENTITYN/A
rockchip-iommu.cG   Y   IDENTITYeither
s390-iommu.cY   Y   N/A N/A 
PLATFORM
sprd-iommu.cY   N/A DMA
sun50i-iommu.c  G   Y   IDENTITYeither
tegra-smmu.cG   Y   IDENTITYIDENTITY
virtio-iommu.c  Y   Y   N/A either
spapr   Y   Y   N/A N/A 
PLATFORM
 * G means ops->identity_domain is used
 * N/A means the driver will not compile in this configuration

ARM32 drivers select an IDENTITY default domain through either the
ops->identity_domain or directly requesting an IDENTIY domain through
alloc_domain().

In ARM64 mode tegra-smmu will still block the use of dma-iommu.c and
forces an IDENTITY domain.

S390 uses a PLATFORM domain to represent when the dma_ops are set to the
s390 iommu code.

fsl_pamu uses an IDENTITY domain.

POWER SPAPR uses PLATFORM and blocking to enable its weird VFIO mode.

The x86 drivers continue unchanged.

After this patch group->default_domain is only NULL for a short period
during bus iommu probing while all the groups are constituted. Otherwise
it is always !NULL.

This completes changing the iommu subsystem driver contract to a system
where the current iommu_domain always represents some form of translation
and the driver is continuously asserting a definable translation mode.

It resolves the confusion that the original ops->detach_dev() caused
around what translation, exactly, is the IOMMU performing after
detach. There were at least three different answers to that question in
the tree, they are all now clearly named with domain types.

Tested-by: Heiko Stuebner 
Tested-by: Niklas Schnelle 
Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e60640f6ccb625..98b855487cf03c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1805,10 +1805,12 @@ static int iommu_get_default_domain_type(struct 
iommu_group *group,
 * ARM32 drivers supporting CONFIG_ARM_DMA_USE_IOMMU can declare an
 * identity_domain and it will automatically become their default
 * domain. Later on ARM_DMA_USE_IOMMU will install its UNMANAGED domain.
-* Override the selection to IDENTITY if we are sure the driver supports
-* it.
+* Override the selection to IDENTITY.
 */
-   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) && ops->identity_domain) {
+   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) {
+   static_assert(!(IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) &&
+   IS_ENABLED(CONFIG_IOMMU_DMA)));
+
type = IOMMU_DOMAIN_IDENTITY;
if (best_type && type && best_type != type)
goto err;
@@ -2906,18 +2908,9 @@ static int iommu_setup_default_domain(struct iommu_group 
*group,
if (req_type < 0)
return -EINVAL;
 
-   /*
-* There are still some drivers which don't support default domains, so
-* we ignore the failure and leave group->default_domain NULL.
-*/
dom = iommu_group_alloc_default_domain(group, req_type);
-   if (!dom) {
-   /* Once in default_domain mode we neve

[PATCH v3 11/25] iommu/tegra-smmu: Implement an IDENTITY domain

2023-06-09 Thread Jason Gunthorpe
What tegra-smmu does during tegra_smmu_set_platform_dma() is actually
putting the iommu into identity mode.

Move to the new core support for ARM_DMA_USE_IOMMU by defining
ops->identity_domain.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/tegra-smmu.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 1cbf063ccf147a..f63f1d4f0bd10f 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -511,23 +511,39 @@ static int tegra_smmu_attach_dev(struct iommu_domain 
*domain,
return err;
 }
 
-static void tegra_smmu_set_platform_dma(struct device *dev)
+static int tegra_smmu_identity_attach(struct iommu_domain *identity_domain,
+ struct device *dev)
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct tegra_smmu_as *as = to_smmu_as(domain);
-   struct tegra_smmu *smmu = as->smmu;
+   struct tegra_smmu_as *as;
+   struct tegra_smmu *smmu;
unsigned int index;
 
if (!fwspec)
-   return;
+   return -ENODEV;
 
+   if (domain == identity_domain || !domain)
+   return 0;
+
+   as = to_smmu_as(domain);
+   smmu = as->smmu;
for (index = 0; index < fwspec->num_ids; index++) {
tegra_smmu_disable(smmu, fwspec->ids[index], as->id);
tegra_smmu_as_unprepare(smmu, as);
}
+   return 0;
 }
 
+static struct iommu_domain_ops tegra_smmu_identity_ops = {
+   .attach_dev = tegra_smmu_identity_attach,
+};
+
+static struct iommu_domain tegra_smmu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &tegra_smmu_identity_ops,
+};
+
 static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
   u32 value)
 {
@@ -962,11 +978,22 @@ static int tegra_smmu_of_xlate(struct device *dev,
return iommu_fwspec_add_ids(dev, &id, 1);
 }
 
+static int tegra_smmu_def_domain_type(struct device *dev)
+{
+   /*
+* FIXME: For now we want to run all translation in IDENTITY mode, due
+* to some device quirks. Better would be to just quirk the troubled
+* devices.
+*/
+   return IOMMU_DOMAIN_IDENTITY;
+}
+
 static const struct iommu_ops tegra_smmu_ops = {
+   .identity_domain = &tegra_smmu_identity_domain,
+   .def_domain_type = &tegra_smmu_def_domain_type,
.domain_alloc = tegra_smmu_domain_alloc,
.probe_device = tegra_smmu_probe_device,
.device_group = tegra_smmu_device_group,
-   .set_platform_dma_ops = tegra_smmu_set_platform_dma,
.of_xlate = tegra_smmu_of_xlate,
.pgsize_bitmap = SZ_4K,
.default_domain_ops = &(const struct iommu_domain_ops) {
-- 
2.40.1



[PATCH v3 08/25] iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()

2023-06-09 Thread Jason Gunthorpe
Except for dart every driver returns 0 or IDENTITY from def_domain_type().

The drivers that return IDENTITY have some kind of good reason, typically
that quirky hardware really can't support anything other than IDENTITY.

Arrange things so that if the driver says it needs IDENTITY then
iommu_get_default_domain_type() either fails or returns IDENTITY.  It will
never reject the driver's override to IDENTITY.

The only real functional difference is that the PCI untrusted flag is now
ignored for quirky HW instead of overriding the IOMMU driver.

This makes the next patch cleaner that wants to force IDENTITY always for
ARM_IOMMU because there is no support for DMA.

Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 66 +--
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c8f6664767152d..0c4fc46c210366 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1608,19 +1608,6 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
-static int iommu_get_def_domain_type(struct device *dev)
-{
-   const struct iommu_ops *ops = dev_iommu_ops(dev);
-
-   if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted)
-   return IOMMU_DOMAIN_DMA;
-
-   if (ops->def_domain_type)
-   return ops->def_domain_type(dev);
-
-   return 0;
-}
-
 static struct iommu_domain *
 __iommu_group_alloc_default_domain(const struct bus_type *bus,
   struct iommu_group *group, int req_type)
@@ -1761,36 +1748,49 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 static int iommu_get_default_domain_type(struct iommu_group *group,
 int target_type)
 {
+   const struct iommu_ops *ops = dev_iommu_ops(
+   list_first_entry(&group->devices, struct group_device, list)
+   ->dev);
int best_type = target_type;
struct group_device *gdev;
struct device *last_dev;
+   int type;
 
lockdep_assert_held(&group->mutex);
-
for_each_group_device(group, gdev) {
-   unsigned int type = iommu_get_def_domain_type(gdev->dev);
-
-   if (best_type && type && best_type != type) {
-   if (target_type) {
-   dev_err_ratelimited(
-   gdev->dev,
-   "Device cannot be in %s domain\n",
-   iommu_domain_type_str(target_type));
-   return -1;
-   }
-
-   dev_warn(
-   gdev->dev,
-   "Device needs domain type %s, but device %s in 
the same iommu group requires type %s - using default\n",
-   iommu_domain_type_str(type), dev_name(last_dev),
-   iommu_domain_type_str(best_type));
-   return 0;
+   type = best_type;
+   if (ops->def_domain_type) {
+   type = ops->def_domain_type(gdev->dev);
+   if (best_type && type && best_type != type)
+   goto err;
}
-   if (!best_type)
-   best_type = type;
+
+   if (dev_is_pci(gdev->dev) && to_pci_dev(gdev->dev)->untrusted) {
+   type = IOMMU_DOMAIN_DMA;
+   if (best_type && type && best_type != type)
+   goto err;
+   }
+   best_type = type;
last_dev = gdev->dev;
}
return best_type;
+
+err:
+   if (target_type) {
+   dev_err_ratelimited(
+   gdev->dev,
+   "Device cannot be in %s domain - it is forcing %s\n",
+   iommu_domain_type_str(target_type),
+   iommu_domain_type_str(type));
+   return -1;
+   }
+
+   dev_warn(
+   gdev->dev,
+   "Device needs domain type %s, but device %s in the same iommu 
group requires type %s - using default\n",
+   iommu_domain_type_str(type), dev_name(last_dev),
+   iommu_domain_type_str(best_type));
+   return 0;
 }
 
 static void iommu_group_do_probe_finalize(struct device *dev)
-- 
2.40.1



[PATCH v3 02/25] iommu: Add IOMMU_DOMAIN_PLATFORM

2023-06-09 Thread Jason Gunthorpe
This is used when the iommu driver is taking control of the dma_ops,
currently only on S390 and power spapr. It is designed to preserve the
original ops->detach_dev() semantic that these S390 was built around.

Provide an opaque domain type and a 'default_domain' ops value that allows
the driver to trivially force any single domain as the default domain.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 14 +-
 include/linux/iommu.h |  6 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bb840a818525ad..c8f6664767152d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1644,6 +1644,17 @@ iommu_group_alloc_default_domain(struct iommu_group 
*group, int req_type)
 
lockdep_assert_held(&group->mutex);
 
+   /*
+* Allow legacy drivers to specify the domain that will be the default
+* domain. This should always be either an IDENTITY or PLATFORM domain.
+* Do not use in new drivers.
+*/
+   if (bus->iommu_ops->default_domain) {
+   if (req_type)
+   return ERR_PTR(-EINVAL);
+   return bus->iommu_ops->default_domain;
+   }
+
if (req_type)
return __iommu_group_alloc_default_domain(bus, group, req_type);
 
@@ -1953,7 +1964,8 @@ void iommu_domain_free(struct iommu_domain *domain)
if (domain->type == IOMMU_DOMAIN_SVA)
mmdrop(domain->mm);
iommu_put_dma_cookie(domain);
-   domain->ops->free(domain);
+   if (domain->ops->free)
+   domain->ops->free(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c3004eac2f88e8..ef0af09326 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,7 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
 
 #define __IOMMU_DOMAIN_SVA (1U << 4)  /* Shared process address space */
+#define __IOMMU_DOMAIN_PLATFORM(1U << 5)
 
 #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ
 /*
@@ -81,6 +82,8 @@ struct iommu_domain_geometry {
  *   invalidation.
  * IOMMU_DOMAIN_SVA- DMA addresses are shared process addresses
  *   represented by mm_struct's.
+ * IOMMU_DOMAIN_PLATFORM   - Legacy domain for drivers that do their own
+ *   dma_api stuff. Do not use in new drivers.
  */
 #define IOMMU_DOMAIN_BLOCKED   (0U)
 #define IOMMU_DOMAIN_IDENTITY  (__IOMMU_DOMAIN_PT)
@@ -91,6 +94,7 @@ struct iommu_domain_geometry {
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
 #define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SVA)
+#define IOMMU_DOMAIN_PLATFORM  (__IOMMU_DOMAIN_PLATFORM)
 
 struct iommu_domain {
unsigned type;
@@ -256,6 +260,7 @@ struct iommu_iotlb_gather {
  * @owner: Driver module providing these ops
  * @identity_domain: An always available, always attachable identity
  *   translation.
+ * @default_domain: If not NULL this will always be set as the default domain.
  */
 struct iommu_ops {
bool (*capable)(struct device *dev, enum iommu_cap);
@@ -290,6 +295,7 @@ struct iommu_ops {
unsigned long pgsize_bitmap;
struct module *owner;
struct iommu_domain *identity_domain;
+   struct iommu_domain *default_domain;
 };
 
 /**
-- 
2.40.1



[PATCH v3 01/25] iommu: Add iommu_ops->identity_domain

2023-06-09 Thread Jason Gunthorpe
This allows a driver to set a global static to an IDENTITY domain and
the core code will automatically use it whenever an IDENTITY domain
is requested.

By making it always available it means the IDENTITY can be used in error
handling paths to force the iommu driver into a known state. Devices
implementing global static identity domains should avoid failing their
attach_dev ops.

Convert rockchip to use the new mechanism.

Tested-by: Steven Price 
Tested-by: Marek Szyprowski 
Tested-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c  | 3 +++
 drivers/iommu/rockchip-iommu.c | 9 +
 include/linux/iommu.h  | 3 +++
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9e0228ef612b85..bb840a818525ad 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1917,6 +1917,9 @@ static struct iommu_domain *__iommu_domain_alloc(const 
struct bus_type *bus,
if (bus == NULL || bus->iommu_ops == NULL)
return NULL;
 
+   if (alloc_type == IOMMU_DOMAIN_IDENTITY && 
bus->iommu_ops->identity_domain)
+   return bus->iommu_ops->identity_domain;
+
domain = bus->iommu_ops->domain_alloc(alloc_type);
if (!domain)
return NULL;
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 4054030c323795..4fbede269e6712 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1017,13 +1017,8 @@ static int rk_iommu_identity_attach(struct iommu_domain 
*identity_domain,
return 0;
 }
 
-static void rk_iommu_identity_free(struct iommu_domain *domain)
-{
-}
-
 static struct iommu_domain_ops rk_identity_ops = {
.attach_dev = rk_iommu_identity_attach,
-   .free = rk_iommu_identity_free,
 };
 
 static struct iommu_domain rk_identity_domain = {
@@ -1087,9 +1082,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
 {
struct rk_iommu_domain *rk_domain;
 
-   if (type == IOMMU_DOMAIN_IDENTITY)
-   return &rk_identity_domain;
-
if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
@@ -1214,6 +1206,7 @@ static int rk_iommu_of_xlate(struct device *dev,
 }
 
 static const struct iommu_ops rk_iommu_ops = {
+   .identity_domain = &rk_identity_domain,
.domain_alloc = rk_iommu_domain_alloc,
.probe_device = rk_iommu_probe_device,
.release_device = rk_iommu_release_device,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d3164259667599..c3004eac2f88e8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -254,6 +254,8 @@ struct iommu_iotlb_gather {
  *will be blocked by the hardware.
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
+ * @identity_domain: An always available, always attachable identity
+ *   translation.
  */
 struct iommu_ops {
bool (*capable)(struct device *dev, enum iommu_cap);
@@ -287,6 +289,7 @@ struct iommu_ops {
const struct iommu_domain_ops *default_domain_ops;
unsigned long pgsize_bitmap;
struct module *owner;
+   struct iommu_domain *identity_domain;
 };
 
 /**
-- 
2.40.1



[PATCH v3 12/25] iommu/tegra-smmu: Support DMA domains in tegra

2023-06-09 Thread Jason Gunthorpe
All ARM64 iommu drivers should support IOMMU_DOMAIN_DMA to enable
dma-iommu.c.

tegra is blocking dma-iommu usage, and also default_domain's, because it
wants an identity translation. This is needed for some device quirk. The
correct way to do this is to support IDENTITY domains and use
ops->def_domain_type() to return IOMMU_DOMAIN_IDENTITY for only the quirky
devices.

Add support for IOMMU_DOMAIN_DMA and force IOMMU_DOMAIN_IDENTITY mode for
everything so no behavior changes.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/tegra-smmu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index f63f1d4f0bd10f..6cba034905edbf 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -276,7 +276,7 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
 {
struct tegra_smmu_as *as;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
as = kzalloc(sizeof(*as), GFP_KERNEL);
@@ -989,6 +989,12 @@ static int tegra_smmu_def_domain_type(struct device *dev)
 }
 
 static const struct iommu_ops tegra_smmu_ops = {
+   /*
+* FIXME: For now we want to run all translation in IDENTITY mode,
+* better would be to have a def_domain_type op do this for just the
+* quirky device.
+*/
+   .default_domain = &tegra_smmu_identity_domain,
.identity_domain = &tegra_smmu_identity_domain,
.def_domain_type = &tegra_smmu_def_domain_type,
.domain_alloc = tegra_smmu_domain_alloc,
-- 
2.40.1



[PATCH v3 13/25] iommu/omap: Implement an IDENTITY domain

2023-06-09 Thread Jason Gunthorpe
What omap does during omap_iommu_set_platform_dma() is actually putting
the iommu into identity mode.

Move to the new core support for ARM_DMA_USE_IOMMU by defining
ops->identity_domain.

This driver does not support IOMMU_DOMAIN_DMA, however it cannot be
compiled on ARM64 either. Most likely it is fine to support dma-iommu.c

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/omap-iommu.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 537e402f9bba97..34340ef15241bc 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1555,16 +1555,31 @@ static void _omap_iommu_detach_dev(struct 
omap_iommu_domain *omap_domain,
omap_domain->dev = NULL;
 }
 
-static void omap_iommu_set_platform_dma(struct device *dev)
+static int omap_iommu_identity_attach(struct iommu_domain *identity_domain,
+ struct device *dev)
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct omap_iommu_domain *omap_domain = to_omap_domain(domain);
+   struct omap_iommu_domain *omap_domain;
 
+   if (domain == identity_domain || !domain)
+   return 0;
+
+   omap_domain = to_omap_domain(domain);
spin_lock(&omap_domain->lock);
_omap_iommu_detach_dev(omap_domain, dev);
spin_unlock(&omap_domain->lock);
+   return 0;
 }
 
+static struct iommu_domain_ops omap_iommu_identity_ops = {
+   .attach_dev = omap_iommu_identity_attach,
+};
+
+static struct iommu_domain omap_iommu_identity_domain = {
+   .type = IOMMU_DOMAIN_IDENTITY,
+   .ops = &omap_iommu_identity_ops,
+};
+
 static struct iommu_domain *omap_iommu_domain_alloc(unsigned type)
 {
struct omap_iommu_domain *omap_domain;
@@ -1732,11 +1747,11 @@ static struct iommu_group 
*omap_iommu_device_group(struct device *dev)
 }
 
 static const struct iommu_ops omap_iommu_ops = {
+   .identity_domain = &omap_iommu_identity_domain,
.domain_alloc   = omap_iommu_domain_alloc,
.probe_device   = omap_iommu_probe_device,
.release_device = omap_iommu_release_device,
.device_group   = omap_iommu_device_group,
-   .set_platform_dma_ops = omap_iommu_set_platform_dma,
.pgsize_bitmap  = OMAP_IOMMU_PGSIZES,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = omap_iommu_attach_dev,
-- 
2.40.1



Re: [PATCH] powerpc/ftrace: Disable ftrace on ppc32 if using clang

2023-06-09 Thread Nick Desaulniers
On Thu, Jun 8, 2023 at 8:47 PM Naveen N Rao  wrote:
>
> Ftrace on ppc32 expects a three instruction sequence at the beginning of
> each function when specifying -pg:
> mflrr0
> stw r0,4(r1)
> bl  _mcount
>
> This is the case with all supported versions of gcc. Clang however emits
> a branch to _mcount after the function prologue, similar to the pre
> -mprofile-kernel ABI on ppc64. This is not supported.
>
> Disable ftrace on ppc32 if using clang for now. This can be re-enabled
> later if clang picks up support for -fpatchable-function-entry on ppc32.
>
> Signed-off-by: Naveen N Rao 

Thanks for the patch! I've filed the below bug, a link to whom I'd
like to see retained in the commit message. In the future, please file
bugs against the compiler vendors first, then include the relevant
link.

Link: https://github.com/llvm/llvm-project/issues/63220
Acked-by: Nick Desaulniers 

> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index bff5820b7cda14..d85e3cf4016d90 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -236,7 +236,7 @@ config PPC
> select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1
> select HAVE_FUNCTION_ERROR_INJECTION
> select HAVE_FUNCTION_GRAPH_TRACER
> -   select HAVE_FUNCTION_TRACER
> +   select HAVE_FUNCTION_TRACER if PPC64 || (PPC32 && 
> CC_IS_GCC)
> select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200   # 
> plugin support on gcc <= 5.1 is buggy on PPC
> select HAVE_GENERIC_VDSO
> select HAVE_HARDLOCKUP_DETECTOR_ARCHif PPC_BOOK3S_64 && SMP
>
> base-commit: bd517a8442b6c6646a136421cd4c1b95bf4ce32b
> --
> 2.40.1
>


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v2 25/25] iommu: Convert remaining simple drivers to domain_alloc_paging()

2023-06-09 Thread Jason Gunthorpe
On Thu, Jun 01, 2023 at 08:47:28PM +0100, Robin Murphy wrote:
> > diff --git a/drivers/iommu/fsl_pamu_domain.c 
> > b/drivers/iommu/fsl_pamu_domain.c
> > index ca4f5ebf028783..8d5d6a3acf9dfd 100644
> > --- a/drivers/iommu/fsl_pamu_domain.c
> > +++ b/drivers/iommu/fsl_pamu_domain.c
> > @@ -192,13 +192,10 @@ static void fsl_pamu_domain_free(struct iommu_domain 
> > *domain)
> > kmem_cache_free(fsl_pamu_domain_cache, dma_domain);
> >   }
> > -static struct iommu_domain *fsl_pamu_domain_alloc(unsigned type)
> > +static struct iommu_domain *fsl_pamu_domain_alloc_paging(struct device 
> > *dev)
> 
> This isn't a paging domain - it doesn't support map/unmap, and AFAICT all it
> has ever been intended to do is "isolate" accesses to within an aperture
> which is never set to anything less than the entire physical address space
> :/
> 
> I hate to imagine what the VFIO userspace applications looked
> like...

After looking at this some more I don't think there is any VFIO
userspace..

There is a VFIO bus driver drivers/vfio/fsl-mc/ but from what I can
tell that is for the ARM version of this platform (from 2020, not
2014) and it doesn't use this driver.

So, really, the only thing this driver does is setup the identity
domain at boot and do something special for
drivers/soc/fsl/qbman/qman_portal.c :\

I wonder if we should just delete it, any chance the power systems
need the code to switch to identity at boot?

Michael do you have an opinion?

Jason


Re: [PATCH v14 00/15] phy: Add support for Lynx 10G SerDes

2023-06-09 Thread Sean Anderson
On 5/22/23 11:00, Vladimir Oltean wrote:
> On Mon, May 22, 2023 at 10:42:04AM -0400, Sean Anderson wrote:
>> Have you had a chance to review this driver?
> 
> Partially / too little (and no, I don't have an answer yet). I am
> debugging a SERDES protocol change procedure from XFI to SGMII.

I'd just like to reiterate that, like I said in the cover letter, I
believe this driver still has value even if it cannot yet perform
protocol switching.

Please send me your feedback, and I will try and incorporate it into the
next revision. Previously, you said you had major objections to the
contents of this series, but you still have not listed them.

--Sean


Re: [PATCH v9 2/4] tpm: of: Make of-tree specific function commonly available

2023-06-09 Thread Stefan Berger




On 6/9/23 14:18, Jarkko Sakkinen wrote:

On Thu May 25, 2023 at 1:56 AM EEST, Jerry Snitselaar wrote:

On Tue, Apr 18, 2023 at 09:44:07AM -0400, Stefan Berger wrote:

Simplify tpm_read_log_of() by moving reusable parts of the code into
an inline function that makes it commonly available so it can be
used also for kexec support. Call the new of_tpm_get_sml_parameters()
function from the TPM Open Firmware driver.

Signed-off-by: Stefan Berger 
Cc: Jarkko Sakkinen 
Cc: Jason Gunthorpe 
Cc: Rob Herring 
Cc: Frank Rowand 
Reviewed-by: Mimi Zohar 
Tested-by: Nageswara R Sastry 
Tested-by: Coiby Xu 
Acked-by: Jarkko Sakkinen 



Reviewed-by: Jerry Snitselaar 


If I just pick tpm only patches they won't apply so maybe TPM changes
should be better separated if that is by any means possible.


Per the comment here I am putting this series here on hold.
https://lore.kernel.org/linux-integrity/20230418134409.177485-1-stef...@linux.ibm.com/T/#m03745c2af2c46f19f329522fcb6ccb2bf2eaedc7


BR,
   Stefan


[PATCH] powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsing

2023-06-09 Thread Rob Herring
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO
"ranges" parsing to use of_range_to_resource() instead.

One change is the original code would look for "#size-cells" and
"#address-cells" in the parent node if not found in the port child
nodes. That is non-standard behavior and not necessary AFAICT. In 2011
in commit 54986964c13c ("powerpc/85xx: Update SRIO device tree nodes")
there was an ABI break. The upstream .dts files have been correct since
at least that point.

Signed-off-by: Rob Herring 
---
 arch/powerpc/sysdev/fsl_rio.c | 34 --
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index f8e492ee54cc..18176d0df612 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -453,8 +453,8 @@ int fsl_rio_setup(struct platform_device *dev)
struct device_node *np, *rmu_node;
int rlen;
u32 ccsr;
-   u64 range_start, range_size;
-   int paw, aw, sw;
+   u64 range_start;
+   int aw;
u32 i;
static int tmp;
struct device_node *rmu_np[MAX_MSG_UNIT_NUM] = {NULL};
@@ -569,6 +569,8 @@ int fsl_rio_setup(struct platform_device *dev)
 
/*set up ports node*/
for_each_child_of_node(dev->dev.of_node, np) {
+   struct resource res;
+
port_index = of_get_property(np, "cell-index", NULL);
if (!port_index) {
dev_err(&dev->dev, "Can't get %pOF property 
'cell-index'\n",
@@ -576,32 +578,14 @@ int fsl_rio_setup(struct platform_device *dev)
continue;
}
 
-   dt_range = of_get_property(np, "ranges", &rlen);
-   if (!dt_range) {
+   if (of_range_to_resource(np, 0, &res)) {
dev_err(&dev->dev, "Can't get %pOF property 'ranges'\n",
np);
continue;
}
 
-   /* Get node address wide */
-   cell = of_get_property(np, "#address-cells", NULL);
-   if (cell)
-   aw = *cell;
-   else
-   aw = of_n_addr_cells(np);
-   /* Get node size wide */
-   cell = of_get_property(np, "#size-cells", NULL);
-   if (cell)
-   sw = *cell;
-   else
-   sw = of_n_size_cells(np);
-   /* Get parent address wide wide */
-   paw = of_n_addr_cells(np);
-   range_start = of_read_number(dt_range + aw, paw);
-   range_size = of_read_number(dt_range + aw + paw, sw);
-
-   dev_info(&dev->dev, "%pOF: LAW start 0x%016llx, size 
0x%016llx.\n",
-   np, range_start, range_size);
+   dev_info(&dev->dev, "%pOF: LAW %pR\n",
+   np, &res);
 
port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL);
if (!port)
@@ -624,9 +608,7 @@ int fsl_rio_setup(struct platform_device *dev)
}
 
INIT_LIST_HEAD(&port->dbells);
-   port->iores.start = range_start;
-   port->iores.end = port->iores.start + range_size - 1;
-   port->iores.flags = IORESOURCE_MEM;
+   port->iores = res;  /* struct copy */
port->iores.name = "rio_io_win";
 
if (request_resource(&iomem_resource, &port->iores) < 0) {
-- 
2.39.2



[PATCH] powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsing

2023-06-09 Thread Rob Herring
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Refactor the FSL RapidIO
"ranges" parsing to use of_range_to_resource() instead.

Signed-off-by: Rob Herring 
---
 arch/powerpc/sysdev/fsl_soc.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c
index 6ebbbca41065..68709743450e 100644
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -51,18 +51,10 @@ phys_addr_t get_immrbase(void)
 
soc = of_find_node_by_type(NULL, "soc");
if (soc) {
-   int size;
-   u32 naddr;
-   const __be32 *prop = of_get_property(soc, "#address-cells", 
&size);
-
-   if (prop && size == 4)
-   naddr = be32_to_cpup(prop);
-   else
-   naddr = 2;
-
-   prop = of_get_property(soc, "ranges", &size);
-   if (prop)
-   immrbase = of_translate_address(soc, prop + naddr);
+   struct resource res;
+
+   if (!of_range_to_resource(soc, 0, &res))
+   immrbase = res.start;
 
of_node_put(soc);
}
-- 
2.39.2



[PATCH] powerpc: mpc512x: Remove open coded "ranges" parsing

2023-06-09 Thread Rob Herring
"ranges" is a standard property, and we have common helper functions
for parsing it, so let's use the for_each_of_range() iterator.

Signed-off-by: Rob Herring 
---
 arch/powerpc/platforms/512x/mpc512x_lpbfifo.c | 46 ++-
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/platforms/512x/mpc512x_lpbfifo.c 
b/arch/powerpc/platforms/512x/mpc512x_lpbfifo.c
index 04bf6ecf7d55..1bfb29574caa 100644
--- a/arch/powerpc/platforms/512x/mpc512x_lpbfifo.c
+++ b/arch/powerpc/platforms/512x/mpc512x_lpbfifo.c
@@ -373,50 +373,32 @@ static int get_cs_ranges(struct device *dev)
 {
int ret = -ENODEV;
struct device_node *lb_node;
-   const u32 *addr_cells_p;
-   const u32 *size_cells_p;
-   int proplen;
-   size_t i;
+   size_t i = 0;
+   struct of_range_parser parser;
+   struct of_range range;
 
lb_node = of_find_compatible_node(NULL, NULL, "fsl,mpc5121-localbus");
if (!lb_node)
return ret;
 
-   /*
-* The node defined as compatible with 'fsl,mpc5121-localbus'
-* should have two address cells and one size cell.
-* Every item of its ranges property should consist of:
-* - the first address cell which is the chipselect number;
-* - the second address cell which is the offset in the chipselect,
-*must be zero.
-* - CPU address of the beginning of an access window;
-* - the only size cell which is the size of an access window.
-*/
-   addr_cells_p = of_get_property(lb_node, "#address-cells", NULL);
-   size_cells_p = of_get_property(lb_node, "#size-cells", NULL);
-   if (addr_cells_p == NULL || *addr_cells_p != 2 ||
-   size_cells_p == NULL || *size_cells_p != 1) {
-   goto end;
-   }
-
-   proplen = of_property_count_u32_elems(lb_node, "ranges");
-   if (proplen <= 0 || proplen % 4 != 0)
-   goto end;
+   of_range_parser_init(&parser, lb_node);
+   lpbfifo.cs_n = of_range_count(&parser);
 
-   lpbfifo.cs_n = proplen / 4;
lpbfifo.cs_ranges = devm_kcalloc(dev, lpbfifo.cs_n,
sizeof(struct cs_range), GFP_KERNEL);
if (!lpbfifo.cs_ranges)
goto end;
 
-   if (of_property_read_u32_array(lb_node, "ranges",
-   (u32 *)lpbfifo.cs_ranges, proplen) != 0) {
-   goto end;
-   }
-
-   for (i = 0; i < lpbfifo.cs_n; i++) {
-   if (lpbfifo.cs_ranges[i].base != 0)
+   for_each_of_range(&parser, &range) {
+   u32 base = lower_32_bits(range.bus_addr);
+   if (base)
goto end;
+
+   lpbfifo.cs_ranges[i].csnum = upper_32_bits(range.bus_addr);
+   lpbfifo.cs_ranges[i].base = base;
+   lpbfifo.cs_ranges[i].addr = range.cpu_addr;
+   lpbfifo.cs_ranges[i].size = range.size;
+   i++;
}
 
ret = 0;
-- 
2.39.2



[PATCH] powerpc: fsl: Use of_property_read_reg() to parse "reg"

2023-06-09 Thread Rob Herring
Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.

Signed-off-by: Rob Herring 
---
 arch/powerpc/sysdev/fsl_rio.c | 14 +++---
 arch/powerpc/sysdev/fsl_rmu.c |  9 +
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index 18176d0df612..33ba1676ef5a 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -448,13 +448,11 @@ int fsl_rio_setup(struct platform_device *dev)
struct rio_mport *port;
struct rio_priv *priv;
int rc = 0;
-   const u32 *dt_range, *cell, *port_index;
+   const u32 *cell, *port_index;
u32 active_ports = 0;
struct device_node *np, *rmu_node;
-   int rlen;
u32 ccsr;
u64 range_start;
-   int aw;
u32 i;
static int tmp;
struct device_node *rmu_np[MAX_MSG_UNIT_NUM] = {NULL};
@@ -528,15 +526,12 @@ int fsl_rio_setup(struct platform_device *dev)
dbell->bellirq = irq_of_parse_and_map(np, 1);
dev_info(&dev->dev, "bellirq: %d\n", dbell->bellirq);
 
-   aw = of_n_addr_cells(np);
-   dt_range = of_get_property(np, "reg", &rlen);
-   if (!dt_range) {
+   if (of_property_read_reg(np, 0, &range_start, NULL)) {
pr_err("%pOF: unable to find 'reg' property\n",
np);
rc = -ENOMEM;
goto err_pw;
}
-   range_start = of_read_number(dt_range, aw);
dbell->dbell_regs = (struct rio_dbell_regs *)(rmu_regs_win +
(u32)range_start);
 
@@ -556,15 +551,12 @@ int fsl_rio_setup(struct platform_device *dev)
pw->dev = &dev->dev;
pw->pwirq = irq_of_parse_and_map(np, 0);
dev_info(&dev->dev, "pwirq: %d\n", pw->pwirq);
-   aw = of_n_addr_cells(np);
-   dt_range = of_get_property(np, "reg", &rlen);
-   if (!dt_range) {
+   if (of_property_read_reg(np, 0, &range_start, NULL)) {
pr_err("%pOF: unable to find 'reg' property\n",
np);
rc = -ENOMEM;
goto err;
}
-   range_start = of_read_number(dt_range, aw);
pw->pw_regs = (struct rio_pw_regs *)(rmu_regs_win + (u32)range_start);
 
/*set up ports node*/
diff --git a/arch/powerpc/sysdev/fsl_rmu.c b/arch/powerpc/sysdev/fsl_rmu.c
index 7a5e2e2b9d06..e27c275c9c2e 100644
--- a/arch/powerpc/sysdev/fsl_rmu.c
+++ b/arch/powerpc/sysdev/fsl_rmu.c
@@ -1067,9 +1067,6 @@ int fsl_rio_setup_rmu(struct rio_mport *mport, struct 
device_node *node)
struct rio_priv *priv;
struct fsl_rmu *rmu;
u64 msg_start;
-   const u32 *msg_addr;
-   int mlen;
-   int aw;
 
if (!mport || !mport->priv)
return -EINVAL;
@@ -1086,16 +1083,12 @@ int fsl_rio_setup_rmu(struct rio_mport *mport, struct 
device_node *node)
if (!rmu)
return -ENOMEM;
 
-   aw = of_n_addr_cells(node);
-   msg_addr = of_get_property(node, "reg", &mlen);
-   if (!msg_addr) {
+   if (of_property_read_reg(node, 0, &msg_start, NULL)) {
pr_err("%pOF: unable to find 'reg' property of message-unit\n",
node);
kfree(rmu);
return -ENOMEM;
}
-   msg_start = of_read_number(msg_addr, aw);
-
rmu->msg_regs = (struct rio_msg_regs *)
(rmu_regs_win + (u32)msg_start);
 
-- 
2.39.2



[PATCH] cpufreq: pmac32: Use of_property_read_reg() to parse "reg"

2023-06-09 Thread Rob Herring
Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.

Signed-off-by: Rob Herring 
---
 drivers/cpufreq/pmac32-cpufreq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/pmac32-cpufreq.c b/drivers/cpufreq/pmac32-cpufreq.c
index ec75e79659ac..f53635ba16c1 100644
--- a/drivers/cpufreq/pmac32-cpufreq.c
+++ b/drivers/cpufreq/pmac32-cpufreq.c
@@ -378,10 +378,9 @@ static int pmac_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
 static u32 read_gpio(struct device_node *np)
 {
-   const u32 *reg = of_get_property(np, "reg", NULL);
-   u32 offset;
+   u64 offset;
 
-   if (reg == NULL)
+   if (of_property_read_reg(np, 0, &offset, NULL) < 0)
return 0;
/* That works for all keylargos but shall be fixed properly
 * some day... The problem is that it seems we can't rely
-- 
2.39.2



[PATCH] macintosh: Use of_property_read_reg() to parse "reg"

2023-06-09 Thread Rob Herring
Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.

Signed-off-by: Rob Herring 
---
 drivers/macintosh/smu.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/macintosh/smu.c b/drivers/macintosh/smu.c
index b495bfa77896..5183a00529f5 100644
--- a/drivers/macintosh/smu.c
+++ b/drivers/macintosh/smu.c
@@ -33,7 +33,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -470,7 +471,7 @@ EXPORT_SYMBOL(smu_present);
 int __init smu_init (void)
 {
struct device_node *np;
-   const u32 *data;
+   u64 data;
int ret = 0;
 
 np = of_find_node_by_type(NULL, "smu");
@@ -514,8 +515,7 @@ int __init smu_init (void)
ret = -ENXIO;
goto fail_bootmem;
}
-   data = of_get_property(smu->db_node, "reg", NULL);
-   if (data == NULL) {
+   if (of_property_read_reg(smu->db_node, 0, &data, NULL)) {
printk(KERN_ERR "SMU: Can't find doorbell GPIO address !\n");
ret = -ENXIO;
goto fail_db_node;
@@ -525,7 +525,7 @@ int __init smu_init (void)
 * and ack. GPIOs are at 0x50, best would be to find that out
 * in the device-tree though.
 */
-   smu->doorbell = *data;
+   smu->doorbell = data;
if (smu->doorbell < 0x50)
smu->doorbell += 0x50;
 
@@ -534,13 +534,12 @@ int __init smu_init (void)
smu->msg_node = of_find_node_by_name(NULL, "smu-interrupt");
if (smu->msg_node == NULL)
break;
-   data = of_get_property(smu->msg_node, "reg", NULL);
-   if (data == NULL) {
+   if (of_property_read_reg(smu->msg_node, 0, &data, NULL)) {
of_node_put(smu->msg_node);
smu->msg_node = NULL;
break;
}
-   smu->msg = *data;
+   smu->msg = data;
if (smu->msg < 0x50)
smu->msg += 0x50;
} while(0);
-- 
2.39.2



Re: [PATCH v9 2/4] tpm: of: Make of-tree specific function commonly available

2023-06-09 Thread Jarkko Sakkinen
On Thu May 25, 2023 at 1:56 AM EEST, Jerry Snitselaar wrote:
> On Tue, Apr 18, 2023 at 09:44:07AM -0400, Stefan Berger wrote:
> > Simplify tpm_read_log_of() by moving reusable parts of the code into
> > an inline function that makes it commonly available so it can be
> > used also for kexec support. Call the new of_tpm_get_sml_parameters()
> > function from the TPM Open Firmware driver.
> > 
> > Signed-off-by: Stefan Berger 
> > Cc: Jarkko Sakkinen 
> > Cc: Jason Gunthorpe 
> > Cc: Rob Herring 
> > Cc: Frank Rowand 
> > Reviewed-by: Mimi Zohar 
> > Tested-by: Nageswara R Sastry 
> > Tested-by: Coiby Xu 
> > Acked-by: Jarkko Sakkinen 
> > 
>
> Reviewed-by: Jerry Snitselaar 

If I just pick tpm only patches they won't apply so maybe TPM changes
should be better separated if that is by any means possible.

Open for counter proposals. Just my thoughts...

I.e. I'm mainly wondering why TPM patches depend on IMA patches?

BR, Jarkko



Re: [PATCH 00/13] mm: jit/text allocator

2023-06-09 Thread Song Liu
On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport  wrote:
>
> On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland  wrote:
> >
> > [...]
> >
> > > > > > Can you give more detail on what parameters you need? If the only 
> > > > > > extra
> > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > text", that's not that big of a deal.
> > > > >
> > > > > My thinking was that we at least need the start + end for each 
> > > > > caller. That
> > > > > might be it, tbh.
> > > >
> > > > Do you mean that modules will have something like
> > > >
> > > >   jit_text_alloc(size, MODULES_START, MODULES_END);
> > > >
> > > > and kprobes will have
> > > >
> > > >   jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > ?
> > >
> > > Yes.
> >
> > How about we start with two APIs:
> >  jit_text_alloc(size);
> >  jit_text_alloc_range(size, start, end);
> >
> > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > not quite convinced it is needed.
>
> Right now arm64 and riscv override bpf and kprobes allocations to use the
> entire vmalloc address space, but having the ability to allocate generated
> code outside of modules area may be useful for other architectures.
>
> Still the start + end for the callers feels backwards to me because the
> callers do not define the ranges, but rather the architectures, so we still
> need a way for architectures to define how they want allocate memory for
> the generated code.

Yeah, this makes sense.

>
> > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > adding enum jit_type parameter to jit_text_alloc().
> > >
> > > That feels backwards to me; it centralizes a bunch of information about
> > > distinct users to be able to shove that into a static array, when the 
> > > callsites
> > > can pass that information.
> >
> > I think we only two type of users: module and everything else (ftrace, 
> > kprobe,
> > bpf stuff). The key differences are:
> >
> >   1. module uses text and data; while everything else only uses text.
> >   2. module code is generated by the compiler, and thus has stronger
> >   requirements in address ranges; everything else are generated via some
> >   JIT or manual written assembly, so they are more flexible with address
> >   ranges (in JIT, we can avoid using instructions that requires a specific
> >   address range).
> >
> > The next question is, can we have the two types of users share the same
> > address ranges? If not, we can reserve the preferred range for modules,
> > and let everything else use the other range. I don't see reasons to further
> > separate users in the "everything else" group.
>
> I agree that we can define only two types: modules and everything else and
> let the architectures define if they need different ranges for these two
> types, or want the same range for everything.
>
> With only two types we can have two API calls for alloc, and a single
> structure that defines the ranges etc from the architecture side rather
> than spread all over.
>
> Like something along these lines:
>
> struct execmem_range {
> unsigned long   start;
> unsigned long   end;
> unsigned long   fallback_start;
> unsigned long   fallback_end;
> pgprot_tpgprot;
> unsigned intalignment;
> };
>
> struct execmem_modules_range {
> enum execmem_module_flags flags;
> struct execmem_range text;
> struct execmem_range data;
> };
>
> struct execmem_jit_range {
> struct execmem_range text;
> };
>
> struct execmem_params {
> struct execmem_modules_rangemodules;
> struct execmem_jit_rangejit;
> };
>
> struct execmem_params *execmem_arch_params(void);
>
> void *execmem_text_alloc(size_t size);
> void *execmem_data_alloc(size_t size);
> void execmem_free(void *ptr);

With the jit variation, maybe we can just call these
module_[text|data]_alloc()?

btw: Depending on the implementation of the allocator, we may also
need separate free()s for text and data.

>
> void *jit_text_alloc(size_t size);
> void jit_free(void *ptr);
>

[...]

How should we move ahead from here?

AFAICT, all these changes can be easily extended and refactored
in the future, so we don't have to make it perfect the first time.
OTOH, having the interface committed (either this set or my
module_alloc_type version) can unblock works in the binpack
allocator and the users side. Therefore, I think we can move
relatively fast here?

Thanks,
Song


Re: [PATCH v2] powerpc/fadump: invoke ibm,os-term with rtas_call_unlocked()

2023-06-09 Thread Mahesh J Salgaonkar
On 2023-06-09 12:44:04 Fri, Hari Bathini wrote:
> Invoke ibm,os-term call with rtas_call_unlocked(), without using the
> RTAS spinlock, to avoid deadlock in the unlikely event of a machine
> crash while making an RTAS call.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/rtas.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index c087320f..a8192e5b1a5f 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1587,6 +1587,7 @@ static bool ibm_extended_os_term;
>  void rtas_os_term(char *str)
>  {
>   s32 token = rtas_function_token(RTAS_FN_IBM_OS_TERM);
> + static struct rtas_args args;
>   int status;
>  
>   /*
> @@ -1607,7 +1608,8 @@ void rtas_os_term(char *str)
>* schedules.
>*/
>   do {
> - status = rtas_call(token, 1, 1, NULL, __pa(rtas_os_term_buf));
> + rtas_call_unlocked(&args, token, 1, 1, NULL, 
> __pa(rtas_os_term_buf));
> + status = be32_to_cpu(args.rets[0]);

Looks good to me.

Reviewed-by: Mahesh Salgaonkar 

>   } while (rtas_busy_delay_time(status));

Thanks,
-Mahesh.


[PATCH v4 1/2] powerpc/legacy_serial: Handle SERIAL_8250_FSL=n build failures

2023-06-09 Thread Uwe Kleine-König
With SERIAL_8250=y and SERIAL_8250_FSL_CONSOLE=n the both
IS_ENABLED(CONFIG_SERIAL_8250) and IS_REACHABLE(CONFIG_SERIAL_8250)
evaluate to true and so fsl8250_handle_irq() is used. However this
function is only available if CONFIG_SERIAL_8250_CONSOLE=y (and thus
SERIAL_8250_FSL=y).

To prepare SERIAL_8250_FSL becoming tristate and being enabled in more
cases, check for IS_REACHABLE(CONFIG_SERIAL_8250_FSL) before making use
of fsl8250_handle_irq(). This check is correct with and without the
change to make SERIAL_8250_FSL modular.

Reported-by: Randy Dunlap 
Fixes: 66eff0ef528b ("powerpc/legacy_serial: Warn about 8250 devices operated 
without active FSL workarounds")
Signed-off-by: Uwe Kleine-König 
---
 arch/powerpc/kernel/legacy_serial.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c
index fdbd85aafeb1..6ee65741dbd5 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -510,7 +510,7 @@ static void __init fixup_port_irq(int index,
 
if (IS_ENABLED(CONFIG_SERIAL_8250) &&
of_device_is_compatible(np, "fsl,ns16550")) {
-   if (IS_REACHABLE(CONFIG_SERIAL_8250)) {
+   if (IS_REACHABLE(CONFIG_SERIAL_8250_FSL)) {
port->handle_irq = fsl8250_handle_irq;
port->has_sysrq = 
IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
} else {
-- 
2.39.2



[PATCH v4 0/2] serial: 8250: Apply FSL workarounds also without SERIAL_8250_CONSOLE

2023-06-09 Thread Uwe Kleine-König
Hello,

this is the fourth iteration of trying to make the FSL workaround code
active even without 8250 console support.

The first patch is a fix for commit 66eff0ef528b (powerpc/legacy_serial:
Warn about 8250 devices operated without active FSL workarounds) that
currently is in tty-next. This patch originates from my v3 that was only
partially applied. (That is a lame excuse though. While the applying the
full series would not have shown this problem, bisection would still
have a problem.)

The second patch makes SERIAL_8250_FSL tristate and thus allows this to
be enabled also with SERIAL_8250=m. This is also the relevant change
since v3, where 8250_fsl.o was linked into 8250-base.ko.

This series is build tested on amd64 and powerpc with all 27 possible
configurations for

SERIAL_8250={y,m,n}
SERIAL_8250_FSL={y,m,n}
SERIAL_OF_PLATFORM={y,m,n}

using:

choices=(y m n)
for i in $(seq 0 26); do
perl -p -e "s/SERIAL_8250=y/SERIAL_8250=${choices[$(((i / 9) % 
3))]}/; s/SERIAL_8250_FSL=y/SERIAL_8250_FSL=${choices[$(((i / 3) % 3))]}/; 
s/SERIAL_OF_PLATFORM=y/SERIAL_OF_PLATFORM=${choices[$((i % 3))]}/;" .config-pre 
> .config &&
make -j 12 ||
break;
done

with .config-pre having COMPILE_TEST=y so this time there shouldn't be a
build regression. (Not all 27 variants are possible, so some valid
configurations are tested twice or more, but that's still good enough.)

The patches have no strong dependency on each other, so they could go in
via different trees. But given that 66eff0ef528b is in tty-next, taking
both via tty sounds most sensible.

Best regards
Uwe

Uwe Kleine-König (2):
  powerpc/legacy_serial: Handle SERIAL_8250_FSL=n build failures
  serial: 8250: Apply FSL workarounds also without SERIAL_8250_CONSOLE

 arch/powerpc/kernel/legacy_serial.c | 2 +-
 drivers/tty/serial/8250/8250_fsl.c  | 3 +++
 drivers/tty/serial/8250/8250_of.c   | 2 +-
 drivers/tty/serial/8250/Kconfig | 6 +++---
 4 files changed, 8 insertions(+), 5 deletions(-)


base-commit: 66eff0ef528b6d6e9a45b68f6cd969dcbe7b800a
-- 
2.39.2



Re: [PATCH] powerpc/signal32: Force inlining of __unsafe_save_user_regs() and save_tm_user_regs_unsafe()

2023-06-09 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Mon Jun 5, 2023 at 6:58 PM AEST, Christophe Leroy wrote:
>> Looking at generated code for handle_signal32() shows calls to a
>> function called __unsafe_save_user_regs.constprop.0 while user access
>> is open.
>>
>> And that __unsafe_save_user_regs.constprop.0 function has two nops at
>> the begining, allowing it to be traced, which is unexpected during
>> user access open window.
>>
>> The solution could be to mark __unsafe_save_user_regs() no trace, but
>> to be on the safe side the most efficient is to flag it __always_inline
>> as already done for function __unsafe_restore_general_regs(). The
>> function is relatively small and only called twice, so the size
>> increase will remain in the noise.
>>
>> Do the same with save_tm_user_regs_unsafe() as it may suffer the
>> same issue.
>
> Could you put a comment so someone doesn't uninline it later?

I think the "unsafe" in the name is probably sufficient to warn people
off, but you never know. Still I'd happily take a patch to add comments :)

> Marking it notrace as well would be sufficient for a comment, if that works.

I nearly did that when applying, but I'm not sure it won't change the
code generation, so I left it as-is.

cheers


Re: kvm/arm64: Spark benchmark

2023-06-09 Thread Marc Zyngier
On Fri, 09 Jun 2023 01:59:35 +0100,
Yu Zhao  wrote:
> 
> TLDR
> 
> Apache Spark spent 12% less time sorting four billion random integers twenty 
> times (in ~4 hours) after this patchset [1].

Why are the 3 architectures you have considered being evaluated with 3
different benchmarks? I am not suspecting you to have cherry-picked
the best results, but I'd really like to see a variety of benchmarks
that exercise this stuff differently.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 1/3] kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures

2023-06-09 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 12/05/2023 à 18:09, Marco Elver a écrit :
>> On Fri, 12 May 2023 at 17:31, Christophe Leroy
>>  wrote:
>>>
>>> Activating KCSAN on a 32 bits architecture leads to the following
>>> link-time failure:
>>>
>>>  LD  .tmp_vmlinux.kallsyms1
>>>powerpc64-linux-ld: kernel/kcsan/core.o: in function 
>>> `__tsan_atomic64_load':
>>>kernel/kcsan/core.c:1273: undefined reference to `__atomic_load_8'
>>>powerpc64-linux-ld: kernel/kcsan/core.o: in function 
>>> `__tsan_atomic64_store':
>>>kernel/kcsan/core.c:1273: undefined reference to `__atomic_store_8'
...
>>>
>>> 32 bits architectures don't have 64 bits atomic builtins. Only
>>> include DEFINE_TSAN_ATOMIC_OPS(64) on 64 bits architectures.
>>>
>>> Fixes: 0f8ad5f2e934 ("kcsan: Add support for atomic builtins")
>>> Suggested-by: Marco Elver 
>>> Signed-off-by: Christophe Leroy 
>> 
>> Reviewed-by: Marco Elver 
>> 
>> Do you have your own tree to take this through with the other patches?
>
> I don't have my own tree but I guess that it can be taken by Michael for 
> 6.5 via powerpc tree with acks from you and Max.
>
> Michael is that ok for you ?

Yeah I can take it.

cheers


[PATCH 10/10] docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document affinity_domain_via_partition sysfs interface file

2023-06-09 Thread Kajol Jain
Add details of the new hv-gpci interface file called
"affinity_domain_via_partition" in the ABI documentation.

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_gpci| 15 +++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
index 1a5636ed3a4b..6bca74cf3220 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -154,3 +154,18 @@ Description:   admin read only
  removed, this sysfs file still be created and give error when 
reading it.
* The end user reading this sysfs file need to decode sysfs 
file data as per
  underneath platform/firmware.
+
+What:  /sys/devices/hv_gpci/interface/affinity_domain_via_partition
+Date:  June 2023
+Contact:   Linux on PowerPC Developer List 
+Description:   admin read only
+   This sysfs file exposes the system topology information by 
making HCALL
+   H_GET_PERF_COUNTER_INFO. The HCALL is made with counter request 
value
+   AFFINITY_DOMAIN_INFORMATION_BY_PARTITION(0xB1).
+   * This sysfs file is only be created for power10 and above 
platforms.
+   * User need root access to read data from this sysfs file.
+   * Incase the HCALL fails with hardware/permission issue, or the 
support for
+ AFFINITY_DOMAIN_INFORMATION_BY_PARTITION counter request value
+ removed, this sysfs file still be created and give error when 
reading it.
+   * The end user reading this sysfs file need to decode sysfs 
file data as per
+ underneath platform/firmware.
-- 
2.35.3



[PATCH 09/10] powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via partition information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_PARTITION(0XB1), can be used to get
the system affinity domain via partition information. To expose the system
affinity domain via partition information, patch adds sysfs file called
"affinity_domain_via_partition" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.

Add macro AFFINITY_DOMAIN_VIA_PAR, which points to the counter request
value for "affinity_domain_via_partition", in hv-gpci.h file. Also add a
new function called "affinity_domain_via_partition_result_parse" to parse
the hcall result and store it in output buffer.

The affinity_domain_via_partition sysfs file is only available for power10
and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_PAR_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_partition attribute in
interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR
macro in hv-gpci.h file.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-gpci.c | 164 
 arch/powerpc/perf/hv-gpci.h |   4 +-
 2 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index b18f6f2d15b0..6e57c6065010 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -437,6 +437,158 @@ static ssize_t affinity_domain_via_domain_show(struct 
device *dev, struct device
return ret;
 }
 
+static void affinity_domain_via_partition_result_parse(int returned_values,
+   int element_size, char *buf, size_t *last_element,
+   size_t *n, struct hv_gpci_request_buffer *arg)
+{
+   size_t i = 0, j = 0;
+   size_t k, l, m;
+   uint16_t total_affinity_domain_ele, size_of_each_affinity_domain_ele;
+
+   /*
+* hcall H_GET_PERF_COUNTER_INFO populates the 'returned_values'
+* to show the total number of counter_value array elements
+* returned via hcall.
+* Unlike other request types, the data structure returned by this
+* request is variable-size. For this counter request type,
+* hcall populates 'cv_element_size' corresponds to minimum size of
+* the structure returned i.e; the size of the structure with no domain
+* information. Below loop go through all counter_value array
+* to determine the number and size of each domain array element and
+* add it to the output buffer.
+*/
+   while (i < returned_values) {
+   k = j;
+   for (; k < j + element_size; k++)
+   *n += sprintf(buf + *n,  "%02x", (u8)arg->bytes[k]);
+   *n += sprintf(buf + *n,  "\n");
+
+   total_affinity_domain_ele = (u8)arg->bytes[k - 2] << 8 | 
(u8)arg->bytes[k - 3];
+   size_of_each_affinity_domain_ele = (u8)arg->bytes[k] << 8 | 
(u8)arg->bytes[k - 1];
+
+   for (l = 0; l < total_affinity_domain_ele; l++) {
+   for (m = 0; m < size_of_each_affinity_domain_ele; m++) {
+   *n += sprintf(buf + *n,  "%02x", 
(u8)arg->bytes[k]);
+   k++;
+   }
+   *n += sprintf(buf + *n,  "\n");
+   }
+
+   *n += sprintf(buf + *n,  "\n");
+   i++;
+   j = k;
+   }
+
+   *last_element = k;
+}
+
+static ssize_t affinity_domain_via_partition_show(struct device *dev, struct 
device_attribute *attr,
+   char *buf)
+{
+   struct hv_gpci_request_buffer *arg;
+   unsigned long ret;
+   size_t n = 0;
+   size_t last_element = 0;
+   u32 starting_index;
+
+   arg = (void *)get_cpu_var(hv_gpci_reqb);
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* Pass the counter request value 0xB1 corresponds to counter request
+* type 'Affinity_domain_information_by_partition',
+* to retrieve the system affinity domain by partition information.
+* starting_index value refers to the starting hardware
+* processor index.
+*/
+   arg->params.counter_request = cpu_to_be32(AFFINITY_DOMAIN_VIA_PAR);
+   arg->params.starting_index = cpu_to_be32(0);
+
+   ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+   virt_to_phys(arg), HGPCI_REQ_BUFFER_SIZE);
+
+   if (!ret)
+   goto parse_result;
+
+   /*
+* ret value as 'H_PARAMETER' implies that the current buffer size
+* can't accommodate all the information, and a partial buffer
+* returned. To handle that, we need to make subsequent requests
+* with next starting index to retrieve additional (missing) data.
+* Below loop do subsequent hcalls with next starting index and add it
+* to buffer util we get all the information.
+*/
+   

[PATCH 08/10] docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document affinity_domain_via_domain sysfs interface file

2023-06-09 Thread Kajol Jain
Add details of the new hv-gpci interface file called
"affinity_domain_via_domain" in the ABI documentation.

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_gpci| 15 +++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
index d8862808c955..1a5636ed3a4b 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -139,3 +139,18 @@ Description:   admin read only
  removed, this sysfs file still be created and give error when 
reading it.
* The end user reading this sysfs file need to decode sysfs 
file data as per
  underneath platform/firmware.
+
+What:  /sys/devices/hv_gpci/interface/affinity_domain_via_domain
+Date:  June 2023
+Contact:   Linux on PowerPC Developer List 
+Description:   admin read only
+   This sysfs file exposes the system topology information by 
making HCALL
+   H_GET_PERF_COUNTER_INFO. The HCALL is made with counter request 
value
+   AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN(0xB0).
+   * This sysfs file is only be created for power10 and above 
platforms.
+   * User need root access to read data from this sysfs file.
+   * Incase the HCALL fails with hardware/permission issue, or the 
support for
+ AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN counter request value
+ removed, this sysfs file still be created and give error when 
reading it.
+   * The end user reading this sysfs file need to decode sysfs 
file data as per
+ underneath platform/firmware.
-- 
2.35.3



[PATCH 07/10] powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via domain information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_DOMAIN(0XB0), can be used to get
the system affinity domain via domain information. To expose the system
affinity domain via domain information, patch adds sysfs file called
"affinity_domain_via_domain" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.

Add macro for AFFINITY_DOMAIN_VIA_DOM, which points to the counter
request value for "affinity_domain_via_domain" in hv-gpci.h file.

The affinity_domain_via_domain sysfs file is only available for power10
and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_DOM_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_domain attribute in interface_attrs
array. Also updated the value of INTERFACE_NULL_ATTR macro in hv-gpci.h
file.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-gpci.c | 77 +
 arch/powerpc/perf/hv-gpci.h |  4 +-
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index cac726f06221..b18f6f2d15b0 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -372,6 +372,71 @@ static ssize_t 
affinity_domain_via_virtual_processor_show(struct device *dev,
return ret;
 }
 
+static ssize_t affinity_domain_via_domain_show(struct device *dev, struct 
device_attribute *attr,
+   char *buf)
+{
+   struct hv_gpci_request_buffer *arg;
+   unsigned long ret;
+   size_t n = 0;
+
+   arg = (void *)get_cpu_var(hv_gpci_reqb);
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* Pass the counter request 0xB0 corresponds to request
+* type 'Affinity_domain_information_by_domain',
+* to retrieve the system affinity domain information.
+* starting_index value refers to the starting hardware
+* processor index.
+*/
+   ret = systeminfo_gpci_request(AFFINITY_DOMAIN_VIA_DOM, 0, 0, buf, &n, 
arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+
+   /*
+* ret value as 'H_PARAMETER' corresponds to 'GEN_BUF_TOO_SMALL', which
+* implies that buffer can't accommodate all information, and a partial 
buffer
+* returned. To handle that, we need to take subsequent requests
+* with next starting index to retrieve additional (missing) data.
+* Below loop do subsequent hcalls with next starting index and add it
+* to buffer util we get all the information.
+*/
+   while (ret == H_PARAMETER) {
+   int returned_values = be16_to_cpu(arg->params.returned_values);
+   int elementsize = be16_to_cpu(arg->params.cv_element_size);
+   int last_element = (returned_values - 1) * elementsize;
+
+   /*
+* Since the starting index value is part of counter_value
+* buffer elements, use the starting index value in the last
+* element and add 1 to make subsequent hcalls.
+*/
+   u32 starting_index = arg->bytes[last_element + 1] +
+   (arg->bytes[last_element] << 8) + 1;
+
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   ret = systeminfo_gpci_request(AFFINITY_DOMAIN_VIA_DOM,
+   starting_index, 0, buf, &n, arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+   }
+
+   return n;
+
+out:
+   put_cpu_var(hv_gpci_reqb);
+   return ret;
+}
+
 static DEVICE_ATTR_RO(kernel_version);
 static DEVICE_ATTR_RO(cpumask);
 
@@ -403,6 +468,11 @@ static struct attribute *interface_attrs[] = {
 * attribute, set in init function if applicable.
 */
NULL,
+   /*
+* This NULL is a placeholder for the affinity_domain_via_domain
+* attribute, set in init function if applicable.
+*/
+   NULL,
NULL,
 };
 
@@ -639,6 +709,10 @@ static void sysinfo_device_attr_create(int 
sysinfo_interface_group_index)
attr->attr.name = "affinity_domain_via_virtual_processor";
attr->show = affinity_domain_via_virtual_processor_show;
break;
+   case INTERFACE_AFFINITY_DOMAIN_VIA_DOM_ATTR:
+   attr->attr.name = "affinity_domain_via_domain";
+   attr->show = affinity_domain_via_domain_show;
+   break;
}
 
attr->attr.mode = 0444;
@@ -658,6 +732,9 @@ static void add_sysinfo_interface_files(void)
 * interface_attrs attribute array
 */
sysinfo_device_attr_create(INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR);
+
+   /* Add affinity_domain_via_domain attribute in the interface_attrs 
attribute array */
+   sysinfo_device_attr_create(INTERFACE_AFFINITY_

[PATCH 06/10] docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document affinity_domain_via_virtual_processor sysfs interface file

2023-06-09 Thread Kajol Jain
Add details of the new hv-gpci interface file called
"affinity_domain_via_virtual_processor" in the ABI documentation.

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_gpci| 15 +++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
index 003d94afbbcd..d8862808c955 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -124,3 +124,18 @@ Description:   admin read only
  created and give error when reading it.
* The end user reading this sysfs file need to decode sysfs 
file data as per
  underneath platform/firmware.
+
+What:  
/sys/devices/hv_gpci/interface/affinity_domain_via_virtual_processor
+Date:  June 2023
+Contact:   Linux on PowerPC Developer List 
+Description:   admin read only
+   This sysfs file exposes the system topology information by 
making HCALL
+   H_GET_PERF_COUNTER_INFO. The HCALL is made with counter request 
value
+   AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR(0xA0).
+   * This sysfs file is only be created for power10 and above 
platforms.
+   * User need root access to read data from this sysfs file.
+   * Incase the HCALL fails with hardware/permission issue, or the 
support for
+ AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR counter 
request value
+ removed, this sysfs file still be created and give error when 
reading it.
+   * The end user reading this sysfs file need to decode sysfs 
file data as per
+ underneath platform/firmware.
-- 
2.35.3



[PATCH 05/10] powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via virtual processor information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
AFFINITY_DOMAIN_INFORMATION_BY_VIRTUAL_PROCESSOR(0XA0), can be used to get
the system affinity domain via virtual processor information. To expose
the system affinity domain via virtual processor information, patch adds
sysfs file called "affinity_domain_via_virtual_processor" to the
"/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver.

Add macro for AFFINITY_DOMAIN_VIA_VP, which points to counter request value
for "affinity_domain_via_virtual_processor" in hv-gpci.h file.

The affinity_domain_via_virtual_processor sysfs file is only available for
power10 and above platforms. Add a macro called
INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR, which points to the index of NULL
placeholder, for affinity_domain_via_virtual_processor attribute in
interface_attrs array. Also updated the value of INTERFACE_NULL_ATTR macro
in hv-gpci.h file.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-gpci.c | 84 +
 arch/powerpc/perf/hv-gpci.h |  4 +-
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index c9fe74373e5f..cac726f06221 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -303,6 +303,75 @@ static ssize_t processor_config_show(struct device *dev, 
struct device_attribute
return ret;
 }
 
+static ssize_t affinity_domain_via_virtual_processor_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct hv_gpci_request_buffer *arg;
+   unsigned long ret;
+   size_t n = 0;
+
+   arg = (void *)get_cpu_var(hv_gpci_reqb);
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* Pass the counter request 0xA0 corresponds to request
+* type 'Affinity_domain_information_by_virutal_processor',
+* to retrieve the system affinity domain information.
+* starting_index value refers to the starting hardware
+* processor index.
+*/
+   ret = systeminfo_gpci_request(AFFINITY_DOMAIN_VIA_VP, 0, 0, buf, &n, 
arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+
+   /*
+* ret value as 'H_PARAMETER' corresponds to 'GEN_BUF_TOO_SMALL', which
+* implies that buffer can't accommodate all information, and a partial 
buffer
+* returned. To handle that, we need to take subsequent requests
+* with next secondary index to retrieve additional (missing) data.
+* Below loop do subsequent hcalls with next secondary index and add it
+* to buffer util we get all the information.
+*/
+   while (ret == H_PARAMETER) {
+   int returned_values = be16_to_cpu(arg->params.returned_values);
+   int elementsize = be16_to_cpu(arg->params.cv_element_size);
+   int last_element = (returned_values - 1) * elementsize;
+
+   /*
+* Since the starting index and secondary index type is part of 
the
+* counter_value buffer elements, use the starting index value 
in the
+* last array element as subsequent starting index, and use 
secondary index
+* value in the last array element plus 1 as subsequent 
secondary index.
+* For counter request '0xA0', starting index points to 
partition id
+* and secondary index points to corresponding virtual 
processor index.
+*/
+   u32 starting_index = arg->bytes[last_element + 1] + 
(arg->bytes[last_element] << 8);
+   u16 secondary_index = arg->bytes[last_element + 3] +
+   (arg->bytes[last_element + 2] << 8) + 1;
+
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   ret = systeminfo_gpci_request(AFFINITY_DOMAIN_VIA_VP, 
starting_index,
+   secondary_index, buf, &n, arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+   }
+
+   return n;
+
+out:
+   put_cpu_var(hv_gpci_reqb);
+   return ret;
+}
+
 static DEVICE_ATTR_RO(kernel_version);
 static DEVICE_ATTR_RO(cpumask);
 
@@ -329,6 +398,11 @@ static struct attribute *interface_attrs[] = {
 * attribute, set in init function if applicable.
 */
NULL,
+   /*
+* This NULL is a placeholder for the 
affinity_domain_via_virtual_processor
+* attribute, set in init function if applicable.
+*/
+   NULL,
NULL,
 };
 
@@ -561,6 +635,10 @@ static void sysinfo_device_attr_create(int 
sysinfo_interface_group_index)
attr->attr.name = "processor_config";
attr->show = processor_config_show;
break;
+   case INTERFACE_AFFINITY_DOMAIN_VIA_VP_ATTR:
+   attr->attr.name = "affinity_doma

[PATCH 04/10] docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document processor_config sysfs interface file

2023-06-09 Thread Kajol Jain
Add details of the new hv-gpci interface file called
"processor_config" in the ABI documentation.

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_gpci| 15 +++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
index 6d633167268e..003d94afbbcd 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -109,3 +109,18 @@ Description:   admin read only
   more information.
 
* "-EFBIG" : System information exceeds PAGE_SIZE.
+
+What:  /sys/devices/hv_gpci/interface/processor_config
+Date:  June 2023
+Contact:   Linux on PowerPC Developer List 
+Description:   admin read only
+   This sysfs file exposes the system topology information by 
making HCALL
+   H_GET_PERF_COUNTER_INFO. The HCALL is made with counter request 
value
+   PROCESSOR_CONFIG(0x90).
+   * This sysfs file is only be created for power10 and above 
platforms.
+   * User need root access to read data from this sysfs file.
+   * Incase the HCALL fails with hardware/permission issue, or the 
support for
+ PROCESSOR_CONFIG counter request value removed, this sysfs 
file still be
+ created and give error when reading it.
+   * The end user reading this sysfs file need to decode sysfs 
file data as per
+ underneath platform/firmware.
-- 
2.35.3



[PATCH 03/10] powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor config information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
PROCESSOR_CONFIG(0X90), can be used to get the system
processor configuration information. To expose the system
processor config information, patch adds sysfs file called
"processor_config" to the "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver.

Add macro for PROCESSOR_CONFIG counter request value in hv-gpci.h file.
Also add a new function called "sysinfo_device_attr_create",
which will create and add required device attribute to the
interface_attrs array.

The processor_config sysfs file is only available for power10
and above platforms. Add a new macro called
INTERFACE_PROCESSOR_CONFIG_ATTR, which points to the index of
NULL placefolder, for processor_config attribute in the interface_attrs
array. Also add macro INTERFACE_NULL_ATTR which points to index of NULL
attribute in interface_attrs array.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-gpci.c | 110 +---
 arch/powerpc/perf/hv-gpci.h |   5 +-
 2 files changed, 107 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index bca24725699e..c9fe74373e5f 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -237,6 +237,72 @@ static ssize_t processor_bus_topology_show(struct device 
*dev, struct device_att
return ret;
 }
 
+static ssize_t processor_config_show(struct device *dev, struct 
device_attribute *attr,
+   char *buf)
+{
+   struct hv_gpci_request_buffer *arg;
+   unsigned long ret;
+   size_t n = 0;
+
+   arg = (void *)get_cpu_var(hv_gpci_reqb);
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* Pass the counter request value 0x90 corresponds to request
+* type 'Processor_config', to retrieve
+* the system processor information.
+* starting_index value implies the starting hardware
+* processor index.
+*/
+   ret = systeminfo_gpci_request(PROCESSOR_CONFIG, 0, 0, buf, &n, arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+
+   /*
+* ret value as 'H_PARAMETER' corresponds to 'GEN_BUF_TOO_SMALL', which
+* implies that buffer can't accommodate all information, and a partial 
buffer
+* returned. To handle that, we need to take subsequent requests
+* with next starting index to retrieve additional (missing) data.
+* Below loop do subsequent hcalls with next starting index and add it
+* to buffer util we get all the information.
+*/
+   while (ret == H_PARAMETER) {
+   int returned_values = be16_to_cpu(arg->params.returned_values);
+   int elementsize = be16_to_cpu(arg->params.cv_element_size);
+   int last_element = (returned_values - 1) * elementsize;
+
+   /*
+* Since the starting index is part of counter_value
+* buffer elements, use the starting index value in the last
+* element and add 1 to subsequent hcalls.
+*/
+   u32 starting_index = arg->bytes[last_element + 3] +
+   (arg->bytes[last_element + 2] << 8) +
+   (arg->bytes[last_element + 1] << 16) +
+   (arg->bytes[last_element] << 24) + 1;
+
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   ret = systeminfo_gpci_request(PROCESSOR_CONFIG, starting_index, 
0, buf, &n, arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAMETER)
+   goto out;
+   }
+
+   return n;
+
+out:
+   put_cpu_var(hv_gpci_reqb);
+   return ret;
+}
+
 static DEVICE_ATTR_RO(kernel_version);
 static DEVICE_ATTR_RO(cpumask);
 
@@ -258,6 +324,11 @@ static struct attribute *interface_attrs[] = {
 * attribute, set in init function if applicable.
 */
NULL,
+   /*
+* This NULL is a placeholder for the processor_config
+* attribute, set in init function if applicable.
+*/
+   NULL,
NULL,
 };
 
@@ -463,21 +534,46 @@ static int hv_gpci_cpu_hotplug_init(void)
  ppc_hv_gpci_cpu_offline);
 }
 
-static void add_sysinfo_interface_files(void)
+static void sysinfo_device_attr_create(int sysinfo_interface_group_index)
 {
-   struct device_attribute *attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+   struct device_attribute *attr;
 
+   if (sysinfo_interface_group_index < 
INTERFACE_PROCESSOR_BUS_TOPOLOGY_ATTR ||
+   sysinfo_interface_group_index >= INTERFACE_NULL_ATTR) {
+   pr_info("Wrong interface group index for system information\n");
+   return;
+   }
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
if (!attr) {
-   pr_info("Memory alloc

[PATCH 02/10] docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document processor_bus_topology sysfs interface file

2023-06-09 Thread Kajol Jain
Add details of the new hv-gpci interface file called
"processor_bus_topology" in the ABI documentation.

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_gpci| 29 +++
 1 file changed, 29 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
index 12e2bf92783f..6d633167268e 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -80,3 +80,32 @@ Contact: Linux on PowerPC Developer List 

 Description:   read only
This sysfs file exposes the cpumask which is designated to make
HCALLs to retrieve hv-gpci pmu event counter data.
+
+What:  /sys/devices/hv_gpci/interface/processor_bus_topology
+Date:  June 2023
+Contact:   Linux on PowerPC Developer List 
+Description:   admin read only
+   This sysfs file exposes the system topology information by 
making HCALL
+   H_GET_PERF_COUNTER_INFO. The HCALL is made with counter request 
value
+   PROCESSOR_BUS_TOPOLOGY(0xD0).
+   * This sysfs file is only be created for power10 and above 
platforms.
+   * User need root access to read data from this sysfs file.
+   * Incase the HCALL fails with hardware/permission issue, or the 
support for
+ PROCESSOR_BUS_TOPOLOGY counter request value removed, this 
sysfs file still be
+ created and give error when reading it.
+   * The end user reading this sysfs file need to decode sysfs 
file data as per
+ underneath platform/firmware.
+
+   Possible error codes while reading this sysfs file:
+
+   * "-EPERM" : Partition is not permitted to retrieve performance 
information,
+   required to set "Enable Performance Information 
Collection" option.
+
+   * "-EOPNOTSUPP" : Requested system information is not available 
for the firmware
+ level and platform.
+
+   * "-EIO" : Can't retrieve system information because of invalid 
buffer length/invalid address
+  or because of some hardware error. Refer 
getPerfCountInfo documentation for
+  more information.
+
+   * "-EFBIG" : System information exceeds PAGE_SIZE.
-- 
2.35.3



[PATCH 00/10] Add sysfs interface files to hv_gpci device to expose system information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO can be used to get data related to
chips, dimms and system topology, by passing different counter request
values.
Patchset adds sysfs files to "/sys/devices/hv_gpci/interface/"
of hv_gpci pmu driver, which will expose system topology information
using H_GET_PERF_COUNTER_INFO hcall. The added sysfs files are
available for power10 and above platforms and needs root access
to read the data.

Patches 1,3,5,7,9 adds sysfs interface files to the hv_gpci
pmu driver, to get system topology information.

List of added sysfs files:
-> processor_bus_topology (Counter request value : 0xD0)
-> processor_config (Counter request value : 0x90)
-> affinity_domain_via_virtual_processor (Counter request value : 0xA0)
-> affinity_domain_via_domain (Counter request value : 0xB0)
-> affinity_domain_via_partition (Counter request value : 0xB1)

Patches 2,4,6,8,10 adds details of the newly added hv_gpci
interface files listed above in the ABI documentation.

Kajol Jain (10):
  powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show
processor bus topology information
  docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document
processor_bus_topology sysfs interface file
  powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show
processor config information
  docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document
processor_config sysfs interface file
  powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity
domain via virtual processor information
  docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document
affinity_domain_via_virtual_processor sysfs interface file
  powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity
domain via domain information
  docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document
affinity_domain_via_domain sysfs interface file
  powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity
domain via partition information
  docs: ABI: sysfs-bus-event_source-devices-hv_gpci: Document
affinity_domain_via_partition sysfs interface file

 .../sysfs-bus-event_source-devices-hv_gpci|  89 +++
 arch/powerpc/perf/hv-gpci.c   | 584 +-
 arch/powerpc/perf/hv-gpci.h   |  15 +
 3 files changed, 686 insertions(+), 2 deletions(-)

-- 
2.35.3



[PATCH 01/10] powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus topology information

2023-06-09 Thread Kajol Jain
The hcall H_GET_PERF_COUNTER_INFO with counter request value as
PROCESSOR_BUS_TOPOLOGY(0XD0), can be used to get the system
topology information. To expose the system topology information,
patch adds sysfs file called "processor_bus_topology" to the
"/sys/devices/hv_gpci/interface/" of hv_gpci pmu driver.

Add macro for PROCESSOR_BUS_TOPOLOGY counter request value
in hv-gpci.h file. Also add a new function called
"systeminfo_gpci_request", to make the H_GET_PERF_COUNTER_INFO hcall
with added macro, and populates the output buffer.

The processor_bus_topology sysfs file is only available for power10
and above platforms. Add a new function called
"add_sysinfo_interface_files", which will add processor_bus_topology
attribute in the interface_attrs array, only for power10 and
above platforms.
Also add macro INTERFACE_PROCESSOR_BUS_TOPOLOGY_ATTR in hv-gpci.h
file, which points to the index of NULL placefolder, for
processor_bus_topology attribute.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-gpci.c | 163 +++-
 arch/powerpc/perf/hv-gpci.h |   6 ++
 2 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 7ff8ff3509f5..bca24725699e 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -102,6 +102,141 @@ static ssize_t cpumask_show(struct device *dev,
return cpumap_print_to_pagebuf(true, buf, &hv_gpci_cpumask);
 }
 
+static DEFINE_PER_CPU(char, hv_gpci_reqb[HGPCI_REQ_BUFFER_SIZE]) 
__aligned(sizeof(uint64_t));
+
+static unsigned long systeminfo_gpci_request(u32 req, u32 starting_index,
+   u16 secondary_index, char *buf,
+   size_t *n, struct hv_gpci_request_buffer *arg)
+{
+   unsigned long ret;
+   size_t i, j;
+
+   arg->params.counter_request = cpu_to_be32(req);
+   arg->params.starting_index = cpu_to_be32(starting_index);
+   arg->params.secondary_index = cpu_to_be16(secondary_index);
+
+   ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+   virt_to_phys(arg), HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* ret value as 'H_PARAMETER' corresponds to 'GEN_BUF_TOO_SMALL',
+* which means that the current buffer size cannot accommodate
+* all the information and a partial buffer returned.
+* hcall fails incase of ret value other than H_SUCCESS or H_PARAMETER.
+*
+* ret value as H_AUTHORITY implies that partition is not permitted to 
retrieve
+* performance information, and required to set
+* "Enable Performance Information Collection" option.
+*/
+   if (ret == H_AUTHORITY)
+   return -EPERM;
+
+   /*
+* ret value as H_NOT_AVAILABLE implies that requested system 
information is
+* not available for the firmware level and platform.
+*/
+   if (ret == H_NOT_AVAILABLE)
+   return -EOPNOTSUPP;
+
+   /*
+* hcall can fail with other possible ret value like 
H_PRIVILEGE/H_HARDWARE
+* because of invalid buffer-length/address or due to some hardware
+* error.
+*/
+   if (ret && (ret != H_PARAMETER))
+   return -EIO;
+
+   /*
+* hcall H_GET_PERF_COUNTER_INFO populates the 'returned_values'
+* to show the total number of counter_value array elements
+* returned via hcall.
+* hcall also populates 'cv_element_size' corresponds to individual
+* counter_value array element size. Below loop go through all
+* counter_value array elements as per their size and add it to
+* the output buffer.
+*/
+   for (i = 0; i < be16_to_cpu(arg->params.returned_values); i++) {
+   j = i * be16_to_cpu(arg->params.cv_element_size);
+
+   for (; j < (i + 1) * be16_to_cpu(arg->params.cv_element_size); 
j++)
+   *n += sprintf(buf + *n,  "%02x", (u8)arg->bytes[j]);
+   *n += sprintf(buf + *n,  "\n");
+   }
+
+   if (*n >= PAGE_SIZE) {
+   pr_info("System information exceeds PAGE_SIZE\n");
+   return -EFBIG;
+   }
+
+   return ret;
+}
+
+static ssize_t processor_bus_topology_show(struct device *dev, struct 
device_attribute *attr,
+   char *buf)
+{
+   struct hv_gpci_request_buffer *arg;
+   unsigned long ret;
+   size_t n = 0;
+
+   arg = (void *)get_cpu_var(hv_gpci_reqb);
+   memset(arg, 0, HGPCI_REQ_BUFFER_SIZE);
+
+   /*
+* Pass the counter request value 0xD0 corresponds to request
+* type 'Processor_bus_topology', to retrieve
+* the system topology information.
+* starting_index value implies the starting hardware
+* chip id.
+*/
+   ret = systeminfo_gpci_request(PROCESSOR_BUS_TOPOLOGY, 0, 0, buf, &n, 
arg);
+
+   if (!ret)
+   return n;
+
+   if (ret != H_PARAM

[PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-09 Thread Nicholas Piggin
The most expensive ordering for hwsync to provide is the store-load
barrier, because all prior stores have to be drained to the caches
before subsequent instructions can complete.

stsync just orders stores which means it can just be a barrer that
goes down the store queue and orders draining, and does not prevent
completion of subsequent instructions. So it should be faster than
hwsync.

Use stsync for wmb(). Older processors that don't recognise the SC
field should treat this as hwsync.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index f0ff5737b0d8..95e637c1a3b6 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -39,7 +39,7 @@
  */
 #define __mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define __rmb()  __asm__ __volatile__ ("sync" : : : "memory")
-#define __wmb()  __asm__ __volatile__ ("sync" : : : "memory")
+#define __wmb()  __asm__ __volatile__ (PPC_STSYNC : : : "memory")
 
 /* The sub-arch has lwsync */
 #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
-- 
2.40.1



[PATCH 3/4] powerpc/64s: Use stncisync instruction for smp_wmb() when available

2023-06-09 Thread Nicholas Piggin
stncisync orders less than lwsync (only cacheable store-store, not
load-load or load-store) so it should be as cheap or cheaper.

Microbenchmarks with no actual loads to order shows that the basic
execution cost is the same on POWER10.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/barrier.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index b95b666f0374..f0ff5737b0d8 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -6,6 +6,8 @@
 #define _ASM_POWERPC_BARRIER_H
 
 #include 
+#include 
+#include 
 
 #ifndef __ASSEMBLY__
 #include 
@@ -41,7 +43,12 @@
 
 /* The sub-arch has lwsync */
 #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
-#define SMPWMB  LWSYNC
+#define SMPWMB \
+   BEGIN_FTR_SECTION;  \
+   LWSYNC; \
+   FTR_SECTION_ELSE;   \
+   .long PPC_RAW_STNCISYNC();  \
+   ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_31)
 #elif defined(CONFIG_BOOKE)
 #define SMPWMB  mbar
 #else
-- 
2.40.1



[PATCH 1/4] powerpc: Make mmiowb a wmb

2023-06-09 Thread Nicholas Piggin
mmiowb must ensure MMIO stores inside a spin lock critical section on
one CPU will not be seen by the device after another CPU takes the
same lock and performs MMIOs.

This just requires cache inhibited stores to be ordered with the store
to unlock the spinlock, so wmb() can be used.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/mmiowb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmiowb.h 
b/arch/powerpc/include/asm/mmiowb.h
index 74a00127eb20..cd071fb97eba 100644
--- a/arch/powerpc/include/asm/mmiowb.h
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -9,7 +9,7 @@
 #include 
 
 #define arch_mmiowb_state()(&local_paca->mmiowb_state)
-#define mmiowb()   mb()
+#define mmiowb()   wmb()
 
 #endif /* CONFIG_MMIOWB */
 
-- 
2.40.1



[PATCH 2/4] powerpc/64s: Add POWER10 store sync mnemonics

2023-06-09 Thread Nicholas Piggin
ISA v3.1 introduces new sync types for store ordering.

  stncisync
  stcisync
  stsync

Add ppc-opcode defines for these. This changes PPC_RAW_SYNC to take
L,SC parameters and adds a PPC_RAW_HWSYNC for callers that want the
plain old sync (aka hwsync).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc-opcode.h | 19 ++-
 arch/powerpc/kernel/traps.c   |  2 +-
 arch/powerpc/lib/feature-fixups.c |  6 +++---
 arch/powerpc/net/bpf_jit_comp64.c |  2 +-
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ca5a0da7df4e..7bc8bbcd4adb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -326,6 +326,8 @@
 #define ___PPC_R(r)(((r) & 0x1) << 16)
 #define ___PPC_PRS(prs)(((prs) & 0x1) << 17)
 #define ___PPC_RIC(ric)(((ric) & 0x3) << 18)
+#define ___PPC_L(l)(((l) & 0x7) << 21)
+#define ___PPC_SC(sc)  (((sc) & 0x3) << 16)
 #define __PPC_RA(a)___PPC_RA(__REG_##a)
 #define __PPC_RA0(a)   ___PPC_RA(__REGA0_##a)
 #define __PPC_RB(b)___PPC_RB(__REG_##b)
@@ -378,8 +380,6 @@
 #define PPC_RAW_LQARX(t, a, b, eh) (0x7c000228 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh))
 #define PPC_RAW_LDARX(t, a, b, eh) (0x7ca8 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh))
 #define PPC_RAW_LWARX(t, a, b, eh) (0x7c28 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | __PPC_EH(eh))
-#define PPC_RAW_PHWSYNC(0x7c8004ac)
-#define PPC_RAW_PLWSYNC(0x7ca004ac)
 #define PPC_RAW_STQCX(t, a, b) (0x7c00016d | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_MADDHD(t, a, b, c) (0x1030 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | ___PPC_RC(c))
 #define PPC_RAW_MADDHDU(t, a, b, c)(0x1031 | ___PPC_RT(t) | 
___PPC_RA(a) | ___PPC_RB(b) | ___PPC_RC(c))
@@ -396,6 +396,13 @@
 #define PPC_RAW_RFCI   (0x4c66)
 #define PPC_RAW_RFDI   (0x4c4e)
 #define PPC_RAW_RFMCI  (0x4c4c)
+#define PPC_RAW_SYNC(l, sc)(0x7c0004ac | ___PPC_L(l) | 
___PPC_SC(sc))
+#define PPC_RAW_HWSYNC()   PPC_RAW_SYNC(0, 0)
+#define PPC_RAW_STNCISYNC()PPC_RAW_SYNC(1, 1)
+#define PPC_RAW_STCISYNC() PPC_RAW_SYNC(0, 2)
+#define PPC_RAW_STSYNC()   PPC_RAW_SYNC(0, 3)
+#define PPC_RAW_PHWSYNC()  PPC_RAW_SYNC(4, 0)
+#define PPC_RAW_PLWSYNC()  PPC_RAW_SYNC(5, 0)
 #define PPC_RAW_TLBILX(t, a, b)(0x7c24 | __PPC_T_TLB(t) |  
__PPC_RA0(a) | __PPC_RB(b))
 #define PPC_RAW_WAIT_v203  (0x7c7c)
 #define PPC_RAW_WAIT(w, p) (0x7c3c | __PPC_WC(w) | __PPC_PL(p))
@@ -421,7 +428,6 @@
 #define PPC_RAW_DCBFPS(a, b)   (0x7cac | ___PPC_RA(a) | 
___PPC_RB(b) | (4 << 21))
 #define PPC_RAW_DCBSTPS(a, b)  (0x7cac | ___PPC_RA(a) | 
___PPC_RB(b) | (6 << 21))
 #define PPC_RAW_SC()   (0x4402)
-#define PPC_RAW_SYNC() (0x7c0004ac)
 #define PPC_RAW_ISYNC()(0x4c00012c)
 
 /*
@@ -641,8 +647,11 @@
 #define STBCIX(s, a, b)stringify_in_c(.long PPC_RAW_STBCIX(s, 
a, b))
 #define PPC_DCBFPS(a, b)   stringify_in_c(.long PPC_RAW_DCBFPS(a, b))
 #define PPC_DCBSTPS(a, b)  stringify_in_c(.long PPC_RAW_DCBSTPS(a, b))
-#define PPC_PHWSYNCstringify_in_c(.long PPC_RAW_PHWSYNC)
-#define PPC_PLWSYNCstringify_in_c(.long PPC_RAW_PLWSYNC)
+#define PPC_STNCISYNC  stringify_in_c(.long PPC_RAW_STNCISYNC())
+#define PPC_STCISYNC   stringify_in_c(.long PPC_RAW_STCISYNC())
+#define PPC_STSYNC stringify_in_c(.long PPC_RAW_STSYNC())
+#define PPC_PHWSYNCstringify_in_c(.long PPC_RAW_PHWSYNC())
+#define PPC_PLWSYNCstringify_in_c(.long PPC_RAW_PLWSYNC())
 #define STXVD2X(s, a, b)   stringify_in_c(.long PPC_RAW_STXVD2X(s, a, b))
 #define LXVD2X(s, a, b)stringify_in_c(.long PPC_RAW_LXVD2X(s, 
a, b))
 #define MFVRD(a, t)stringify_in_c(.long PPC_RAW_MFVRD(a, t))
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9bdd79aa51cf..4b216c208f41 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -550,7 +550,7 @@ static inline int check_io_access(struct pt_regs *regs)
nip -= 2;
else if (*nip == PPC_RAW_ISYNC())
--nip;
-   if (*nip == PPC_RAW_SYNC() || get_op(*nip) == OP_TRAP) {
+   if (*nip == PPC_RAW_HWSYNC() || get_op(*nip) == OP_TRAP) {
unsigned int rb;
 
--nip;
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 80def1c2afcb..4c6e7111354f 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/pow

Re: [PATCH mm-unstable v2 00/10] mm/kvm: locklessly clear the accessed bit

2023-06-09 Thread Paolo Bonzini

On 5/27/23 01:44, Yu Zhao wrote:

TLDR

This patchset adds a fast path to clear the accessed bit without
taking kvm->mmu_lock. It can significantly improve the performance of
guests when the host is under heavy memory pressure.

ChromeOS has been using a similar approach [1] since mid 2021 and it
was proven successful on tens of millions devices.

This v2 addressed previous requests [2] on refactoring code, removing
inaccurate/redundant texts, etc.

[1]https://crrev.com/c/2987928
[2]https://lore.kernel.org/r/20230217041230.2417228-1-yuz...@google.com/


From the KVM point of view the patches look good (though I wouldn't 
mind if Nicholas took a look at the ppc part).  Jason's comment on the 
MMU notifier side are promising as well.  Can you send v3 with Oliver's 
comments addressed?


Thanks,

Paolo



Re: [PATCH mm-unstable v2 09/10] kvm/x86: add kvm_arch_test_clear_young()

2023-06-09 Thread Paolo Bonzini

On 5/27/23 01:44, Yu Zhao wrote:

+#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
+static inline bool kvm_arch_has_test_clear_young(void)
+{
+   return IS_ENABLED(CONFIG_X86_64) &&
+  (!IS_REACHABLE(CONFIG_KVM) || (tdp_mmu_enabled && 
shadow_accessed_mask));
+}


I don't think you need IS_REACHABLE(CONFIG_KVM) here, it would be a bug 
if this is called from outside KVM code.


Maybe make it a BUILD_BUG_ON?

Paolo



Re: [PATCH mm-unstable v2 01/10] mm/kvm: add mmu_notifier_ops->test_clear_young()

2023-06-09 Thread Paolo Bonzini

On 5/31/23 21:17, Jason Gunthorpe wrote:

+   int (*test_clear_young)(struct mmu_notifier *mn, struct mm_struct *mm,
+   unsigned long start, unsigned long end,
+   bool clear, unsigned long *bitmap);
+

Why leave clear_young behind? Just make a NULL bitmap mean
clear_young?


It goes away in patch 2, together with:

@@ -437,7 +412,7 @@ static inline int mmu_notifier_clear_young(struct mm_struct 
*mm,
   unsigned long end)
 {
if (mm_has_notifiers(mm))
-   return __mmu_notifier_clear_young(mm, start, end);
+   return __mmu_notifier_test_clear_young(mm, start, end, true, 
NULL);
return 0;
 }
 
@@ -445,7 +420,7 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm,

  unsigned long address)
 {
if (mm_has_notifiers(mm))
-   return __mmu_notifier_test_young(mm, address);
+   return __mmu_notifier_test_clear_young(mm, address, address + 
1, false, NULL);
return 0;
 }
 


Paolo



Re: [PATCH] powerpc/legacy_serial: check CONFIG_SERIAL_8250_CONSOLE

2023-06-09 Thread Uwe Kleine-König
Hello Randy,

On Thu, Jun 08, 2023 at 05:33:28PM -0700, Randy Dunlap wrote:
> When SERIAL_8250_CONSOLE is not set but PPC_UDBG_16550=y,
> the legacy_serial code references fsl8250_handle_irq, which is
> only built when SERIAL_8250_CONSOLE is set.
> 
> Be consistent in referencing the used CONFIG_SERIAL_8250*
> symbols so that the build errors do not happen.
> 
> Prevents these build errors:
> 
> powerpc-linux-ld: arch/powerpc/kernel/legacy_serial.o: in function 
> `serial_dev_init':
> legacy_serial.c:(.init.text+0x2aa): undefined reference to 
> `fsl8250_handle_irq'
> powerpc-linux-ld: legacy_serial.c:(.init.text+0x2b2): undefined reference to 
> `fsl8250_handle_irq'
> 
> Fixes: 66eff0ef528b ("powerpc/legacy_serial: Warn about 8250 devices operated 
> without active FSL workarounds")
> Signed-off-by: Randy Dunlap 
> Cc: Uwe Kleine-König 
> Cc: Greg Kroah-Hartman 
> Cc: linux-ser...@vger.kernel.org
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/legacy_serial.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -- a/arch/powerpc/kernel/legacy_serial.c 
> b/arch/powerpc/kernel/legacy_serial.c
> --- a/arch/powerpc/kernel/legacy_serial.c
> +++ b/arch/powerpc/kernel/legacy_serial.c
> @@ -508,9 +508,9 @@ static void __init fixup_port_irq(int in
>  
>   port->irq = virq;
>  
> - if (IS_ENABLED(CONFIG_SERIAL_8250) &&
> + if (IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE) &&
>   of_device_is_compatible(np, "fsl,ns16550")) {
> - if (IS_REACHABLE(CONFIG_SERIAL_8250)) {
> + if (IS_REACHABLE(CONFIG_SERIAL_8250_CONSOLE)) {
>   port->handle_irq = fsl8250_handle_irq;
>   port->has_sysrq = 
> IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
>   } else {

Argh, indeed there is a problem. Your patch however defeats the idea of
66eff0ef528b[1] which is still valid. And with your patch the else
branch that starts at the end of the above hunk is never taken.

With the feedback I got on
https://lore.kernel.org/linux-serial/20230605130857.85543-3-u.kleine-koe...@pengutronix.de
the probable outcome is that CONFIG_SERIAL_8250_FSL becomes tristate and
so the fix that is more future proof and keeps the warning, looks as
follows:

diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c
index fdbd85aafeb1..6ee65741dbd5 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -510,7 +510,7 @@ static void __init fixup_port_irq(int index,
 
if (IS_ENABLED(CONFIG_SERIAL_8250) &&
of_device_is_compatible(np, "fsl,ns16550")) {
-   if (IS_REACHABLE(CONFIG_SERIAL_8250)) {
+   if (IS_REACHABLE(CONFIG_SERIAL_8250_FSL)) {
port->handle_irq = fsl8250_handle_irq;
port->has_sysrq = 
IS_ENABLED(CONFIG_SERIAL_8250_CONSOLE);
} else {

This should to the right thing now (while CONFIG_SERIAL_8250_FSL is
still bool and only on if CONFIG_SERIAL_8250 is =y) and also once
CONFIG_SERIAL_8250_FSL can be =m (which would make fsl8250_handle_irq
not available for powerpc platform code).

But given that I screwed this up several times now, I will think about
this some more and do some more tests before submitting that as a proper
patch.

Best regards
Uwe

[1] Warn if the 8250 device is used but the required FSL workarounds are
not.

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


[PATCH v2 07/23 fix] mips: update_mmu_cache() can replace __update_tlb(): fix

2023-06-09 Thread Hugh Dickins
I expect this to fix the
arch/mips/mm/tlb-r4k.c:300:16: warning: variable 'pmdp' set but not used
reported by the kernel test robot; but I am uncomfortable rearranging
lines in this tlb_probe_hazard() area, and would be glad for review and
testing by someone familiar with mips - thanks in advance!

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202306091304.cnvispk0-...@intel.com/
Signed-off-by: Hugh Dickins 
---
 arch/mips/mm/tlb-r4k.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index c96725d17cab..80fc90d8d2f1 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -293,11 +293,13 @@ void local_flush_tlb_one(unsigned long page)
 void update_mmu_cache(struct vm_area_struct *vma,
  unsigned long address, pte_t *ptep)
 {
-   unsigned long flags;
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
pgd_t *pgdp;
p4d_t *p4dp;
pud_t *pudp;
pmd_t *pmdp;
+#endif
+   unsigned long flags;
int idx, pid;
 
/*
@@ -316,15 +318,15 @@ void update_mmu_cache(struct vm_area_struct *vma,
pid = read_c0_entryhi() & cpu_asid_mask(¤t_cpu_data);
write_c0_entryhi(address | pid);
}
-   pgdp = pgd_offset(vma->vm_mm, address);
mtc0_tlbw_hazard();
tlb_probe();
tlb_probe_hazard();
+   idx = read_c0_index();
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+   pgdp = pgd_offset(vma->vm_mm, address);
p4dp = p4d_offset(pgdp, address);
pudp = pud_offset(p4dp, address);
pmdp = pmd_offset(pudp, address);
-   idx = read_c0_index();
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
/* this could be a huge page  */
if (ptep == (pte_t *)pmdp) {
unsigned long lo;
-- 
2.35.3



[PATCH v6 17/19] powerpc: mm: Convert to GENERIC_IOREMAP

2023-06-09 Thread Baoquan He
From: Christophe Leroy 

By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap()
and iounmap() are all visible and available to arch. Arch needs to
provide wrapper functions to override the generic versions if there's
arch specific handling in its ioremap_prot(), ioremap() or iounmap().
This change will simplify implementation by removing duplicated codes
with generic_ioremap_prot() and generic_iounmap(), and has the equivalent
functioality as before.

Here, add wrapper functions ioremap_prot() and iounmap() for powerpc's
special operation when ioremap() and iounmap().

Signed-off-by: Christophe Leroy 
Signed-off-by: Baoquan He 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Mike Rapoport (IBM) 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/io.h |  8 +++-
 arch/powerpc/mm/ioremap.c | 26 +-
 arch/powerpc/mm/ioremap_32.c  | 19 +--
 arch/powerpc/mm/ioremap_64.c  | 12 ++--
 5 files changed, 16 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bff5820b7cda..aadb280a539e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -194,6 +194,7 @@ config PPC
select GENERIC_CPU_VULNERABILITIES  if PPC_BARRIER_NOSPEC
select GENERIC_EARLY_IOREMAP
select GENERIC_GETTIMEOFDAY
+   select GENERIC_IOREMAP
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_PCI_IOMAPif PCI
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 67a3fb6de498..0732b743e099 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -889,8 +889,8 @@ static inline void iosync(void)
  *
  */
 extern void __iomem *ioremap(phys_addr_t address, unsigned long size);
-extern void __iomem *ioremap_prot(phys_addr_t address, unsigned long size,
- unsigned long flags);
+#define ioremap ioremap
+#define ioremap_prot ioremap_prot
 extern void __iomem *ioremap_wc(phys_addr_t address, unsigned long size);
 #define ioremap_wc ioremap_wc
 
@@ -904,14 +904,12 @@ void __iomem *ioremap_coherent(phys_addr_t address, 
unsigned long size);
 #define ioremap_cache(addr, size) \
ioremap_prot((addr), (size), pgprot_val(PAGE_KERNEL))
 
-extern void iounmap(volatile void __iomem *addr);
+#define iounmap iounmap
 
 void __iomem *ioremap_phb(phys_addr_t paddr, unsigned long size);
 
 int early_ioremap_range(unsigned long ea, phys_addr_t pa,
unsigned long size, pgprot_t prot);
-void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long 
size,
-pgprot_t prot, void *caller);
 
 extern void __iomem *__ioremap_caller(phys_addr_t, unsigned long size,
  pgprot_t prot, void *caller);
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 4f12504fb405..705e8e8ffde4 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -41,7 +41,7 @@ void __iomem *ioremap_coherent(phys_addr_t addr, unsigned 
long size)
return __ioremap_caller(addr, size, prot, caller);
 }
 
-void __iomem *ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long 
flags)
+void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long flags)
 {
pte_t pte = __pte(flags);
void *caller = __builtin_return_address(0);
@@ -74,27 +74,3 @@ int early_ioremap_range(unsigned long ea, phys_addr_t pa,
 
return 0;
 }
-
-void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long 
size,
-pgprot_t prot, void *caller)
-{
-   struct vm_struct *area;
-   int ret;
-   unsigned long va;
-
-   area = __get_vm_area_caller(size, VM_IOREMAP, IOREMAP_START, 
IOREMAP_END, caller);
-   if (area == NULL)
-   return NULL;
-
-   area->phys_addr = pa;
-   va = (unsigned long)area->addr;
-
-   ret = ioremap_page_range(va, va + size, pa, prot);
-   if (!ret)
-   return (void __iomem *)area->addr + offset;
-
-   vunmap_range(va, va + size);
-   free_vm_area(area);
-
-   return NULL;
-}
diff --git a/arch/powerpc/mm/ioremap_32.c b/arch/powerpc/mm/ioremap_32.c
index 9d13143b8be4..ca5bc6be3e6f 100644
--- a/arch/powerpc/mm/ioremap_32.c
+++ b/arch/powerpc/mm/ioremap_32.c
@@ -21,6 +21,13 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, 
pgprot_t prot, void *call
phys_addr_t p, offset;
int err;
 
+   /*
+* If the address lies within the first 16 MB, assume it's in ISA
+* memory space
+*/
+   if (addr < SZ_16M)
+   addr += _ISA_MEM_BASE;
+
/*
 * Choose an address to map it to.
 * Once the vmalloc system is running, we use i

[PATCH v2] powerpc/fadump: invoke ibm,os-term with rtas_call_unlocked()

2023-06-09 Thread Hari Bathini
Invoke ibm,os-term call with rtas_call_unlocked(), without using the
RTAS spinlock, to avoid deadlock in the unlikely event of a machine
crash while making an RTAS call.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/rtas.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c087320f..a8192e5b1a5f 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1587,6 +1587,7 @@ static bool ibm_extended_os_term;
 void rtas_os_term(char *str)
 {
s32 token = rtas_function_token(RTAS_FN_IBM_OS_TERM);
+   static struct rtas_args args;
int status;
 
/*
@@ -1607,7 +1608,8 @@ void rtas_os_term(char *str)
 * schedules.
 */
do {
-   status = rtas_call(token, 1, 1, NULL, __pa(rtas_os_term_buf));
+   rtas_call_unlocked(&args, token, 1, 1, NULL, 
__pa(rtas_os_term_buf));
+   status = be32_to_cpu(args.rets[0]);
} while (rtas_busy_delay_time(status));
 
if (status != 0)
-- 
2.40.1