Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

2024-09-26 Thread Akhil P Oommen
On Fri, Sep 13, 2024 at 12:51:31PM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> devices (x1-85, possibly others), it seems to pass that barrier while
> there are still things in the event completion FIFO waiting to be
> written back to memory.
> 
> Work around that by adding a fence wait before context switch.  The
> CP_EVENT_WRITE that writes the fence is the last write from a submit,
> so seeing this value hit memory is a reliable indication that it is
> safe to proceed with the context switch.
> 
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/63
> Signed-off-by: Rob Clark 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bcaec86ac67a..ba5b35502e6d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -101,9 +101,10 @@ static void get_stats_counter(struct msm_ringbuffer 
> *ring, u32 counter,
>  }
>  
>  static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
> - struct msm_ringbuffer *ring, struct msm_file_private *ctx)
> + struct msm_ringbuffer *ring, struct msm_gem_submit *submit)
>  {
>   bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> + struct msm_file_private *ctx = submit->queue->ctx;
>   struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>   phys_addr_t ttbr;
>   u32 asid;
> @@ -115,6 +116,13 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
>   if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
>   return;
>  
> + /* Wait for previous submit to complete before continuing: */
> + OUT_PKT7(ring, CP_WAIT_TIMESTAMP, 4);
> + OUT_RING(ring, 0);
> + OUT_RING(ring, lower_32_bits(rbmemptr(ring, fence)));
> + OUT_RING(ring, upper_32_bits(rbmemptr(ring, fence)));
> + OUT_RING(ring, submit->seqno - 1);
> +
>   if (!sysprof) {
>   if (!adreno_is_a7xx(adreno_gpu)) {
>   /* Turn off protected mode to write to special 
> registers */
> @@ -193,7 +201,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   struct msm_ringbuffer *ring = submit->ring;
>   unsigned int i, ibs = 0;
>  
> - a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
> + a6xx_set_pagetable(a6xx_gpu, ring, submit);
>  
>   get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
>   rbmemptr_stats(ring, index, cpcycles_start));
> @@ -283,7 +291,7 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   OUT_PKT7(ring, CP_THREAD_CONTROL, 1);
>   OUT_RING(ring, CP_THREAD_CONTROL_0_SYNC_THREADS | CP_SET_THREAD_BR);
>  
> - a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
> + a6xx_set_pagetable(a6xx_gpu, ring, submit);
>  
>   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
>   rbmemptr_stats(ring, index, cpcycles_start));
> -- 
> 2.46.0
> 


Re: [PATCH v4 00/11] Preemption support for A7XX

2024-09-24 Thread Akhil P Oommen
On Tue, Sep 24, 2024 at 07:47:12AM -0700, Rob Clark wrote:
> On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
>  wrote:
> >
> > On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> > > On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> > >> Hi,
> > >>
> > >> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > >>> This series implements preemption for A7XX targets, which allows the 
> > >>> GPU to
> > >>> switch to an higher priority ring when work is pushed to it, reducing 
> > >>> latency
> > >>> for high priority submissions.
> > >>>
> > >>> This series enables L1 preemption with skip_save_restore which requires
> > >>> the following userspace patches to function:
> > >>>
> > >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > >>>
> > >>> A flag is added to `msm_submitqueue_create` to only allow submissions
> > >>> from compatible userspace to be preempted, therefore maintaining
> > >>> compatibility.
> > >>>
> > >>> Preemption is currently only enabled by default on A750, it can be
> > >>> enabled on other targets through the `enable_preemption` module
> > >>> parameter. This is because more testing is required on other targets.
> > >>>
> > >>> For testing on other HW it is sufficient to set that parameter to a
> > >>> value of 1, then using the branch of mesa linked above, 
> > >>> `TU_DEBUG=hiprio`
> > >>> allows to run any application as high priority therefore preempting
> > >>> submissions from other applications.
> > >>>
> > >>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > >>> added in this series can be used to observe preemption's behavior as
> > >>> well as measuring preemption latency.
> > >>>
> > >>> Some commits from this series are based on a previous series to enable
> > >>> preemption on A6XX targets:
> > >>>
> > >>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> > >>>
> > >>> Signed-off-by: Antonino Maniscalco 
> > >>> ---
> > >>> Changes in v4:
> > >>> - Added missing register in pwrup list
> > >>> - Removed and rearrange barriers
> > >>> - Renamed `skip_inline_wptr` to `restore_wptr`
> > >>> - Track ctx seqno per ring
> > >>> - Removed secure preempt context
> > >>> - NOP out postamble to disable it instantly
> > >>> - Only emit pwrup reglist once
> > >>> - Document bv_rptr_addr
> > >>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > >>> - Set name on preempt record buffer
> > >>> - Link to v3: 
> > >>> https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f...@gmail.com
> > >>>
> > >>> Changes in v3:
> > >>> - Added documentation about preemption
> > >>> - Use quirks to determine which target supports preemption
> > >>> - Add a module parameter to force disabling or enabling preemption
> > >>> - Clear postamble when profiling
> > >>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > >>> - Make preemption records MAP_PRIV
> > >>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > >>> anymore
> > >>> - Link to v2: 
> > >>> https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> > >>>
> > >>> Changes in v2:
> > >>> - Added preept_record_size for X185 in PATCH 3/7
> > >>> - Added patches to reset perf counters
> > >>> - Dropped unused defines
> > >>> - Dropped unused variable (fixes warning)
> > >>> - Only enable preemption on a750
> > >>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > >>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > >>> - Added Neil's Tested-By tags
> > >>> - Added explanation for UAPI changes in commit message
> > >>> - Link to v1: 
> > >>> https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> > >>>
> > >>> ---
> > >

Re: [PATCH 3/3] arm64: dts: qcom: sa8775p: Add gpu and gmu nodes

2024-09-23 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 12:27:03AM +0300, Dmitry Baryshkov wrote:
> On Wed, Sep 18, 2024 at 02:08:43AM GMT, Akhil P Oommen wrote:
> > From: Puranam V G Tejaswi 
> > 
> > Add gpu and gmu nodes for sa8775p based platforms.
> 
> Which platforms? The commit adds nodes to the SoC and the single RIDE
> platform.
> 
> > 
> > Signed-off-by: Puranam V G Tejaswi 
> > Signed-off-by: Akhil P Oommen 
> > ---
> >  arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi |  8 
> >  arch/arm64/boot/dts/qcom/sa8775p.dtsi  | 75 
> > ++
> >  2 files changed, 83 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi 
> > b/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
> > index 2a6170623ea9..a01e6675c4bb 100644
> > --- a/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
> > @@ -407,6 +407,14 @@ queue3 {
> > };
> >  };
> >  
> > +&gpu {
> > +   status = "okay";
> > +
> > +   zap-shader {
> 
> It's easier to add gpu_zap_shader_link label in the DTSI file and then
> reference it instead of using the subnode again.
> 
> > +   firmware-name = "qcom/sa8775p/a663_zap.mbn";
> > +   };
> > +};
> 
> Separate patch, please.
> 
> > +
> >  &i2c11 {
> > clock-frequency = <40>;
> > pinctrl-0 = <&qup_i2c11_default>;
> > diff --git a/arch/arm64/boot/dts/qcom/sa8775p.dtsi 
> > b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> > index 23f1b2e5e624..12c79135a303 100644
> > --- a/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> > @@ -2824,6 +2824,81 @@ tcsr_mutex: hwlock@1f4 {
> > #hwlock-cells = <1>;
> > };
> >  
> > +   gpu: gpu@3d0 {
> > +   compatible = "qcom,adreno-663.0", "qcom,adreno";
> > +   reg = <0 0x03d0 0 0x4>,
> > + <0 0x03d9e000 0 0x1000>,
> > + <0 0x03d61000 0 0x800>;
> 
> I think it's suggested to use 0x0 now
> 
> > +   reg-names = "kgsl_3d0_reg_memory",
> > +   "cx_mem",
> > +   "cx_dbgc";
> > +   interrupts = ;
> > +   iommus = <&adreno_smmu 0 0xc00>,
> > +<&adreno_smmu 1 0xc00>;
> > +   operating-points-v2 = <&gpu_opp_table>;
> > +   qcom,gmu = <&gmu>;
> > +   interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt 
> > SLAVE_EBI1 0>;
> 
> QCOM_ICC_TAG_ALWAYS instead of 0
> 
> > +   interconnect-names = "gfx-mem";
> > +   #cooling-cells = <2>;
> 
> No speed bins?

Thanks for the review. Agree on all comments.

Speedbins were missed because we are sharing these changes early in the
developement cycle, sort of like what we did for chromeos develeopment.
Will try to pick it up in the next patchset.

-Akhil

> 
> > +
> > +   status = "disabled";
> > +
> > +   zap-shader {
> 
> gpu_zap_shader: zap-shader
> 
> > +   memory-region = <&pil_gpu_mem>;
> > +   };
> > +
> > +   gpu_opp_table: opp-table {
> > +   compatible = "operating-points-v2";
> > +
> > +   opp-40500 {
> 
> Just a single freq?
> 
> > +   opp-hz = /bits/ 64 <40500>;
> > +   opp-level = 
> > ;
> > +   opp-peak-kBps = <8368000>;
> > +   };
> > +
> 
> Drop the empty line, please.
> 
> > +   };
> > +   };
> > +
> > +   gmu: gmu@3d6a000 {
> > +   compatible = "qcom,adreno-gmu-663.0", "qcom,adreno-gmu";
> > +   reg = <0 0x03d6a000 0 0x34000>,
> > +   <0 0x3de 0 0x1>,
> > +   <0 0x0b29 0 0x1>;
> 
> Wrong indentation, please align to the angle bracket.
> Also I think it's suggested to use 0x0 now
> 
> > +   reg-names = &quo

Re: [PATCH 0/3] DRM/MSM: Support for Adreno 663 GPU

2024-09-23 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 12:34:32AM +0300, Dmitry Baryshkov wrote:
> On Wed, Sep 18, 2024 at 02:08:40AM GMT, Akhil P Oommen wrote:
> > This series adds support for Adreno 663 gpu found in SA8775P chipsets.
> > The closest gpu which is currently supported in drm-msm is A660.
> > Following are the major differences with that:
> > 1. gmu/zap firmwares
> > 2. Recommended to disable Level2 swizzling
> > 
> > Verified kmscube with the below Mesa change [1]. This series is rebased
> > on top of msm-next.
> 
> Is there a chance of you sharing Vulkan CTS results?

No. As of now there are no plans to run CTS.

-Akhil

> 
> > 
> > Patch (1) & (2) for Rob Clark and Patch (3) for Bjorn
> > 
> > [0] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31211
> > 
> > To: Rob Clark 
> > To: Sean Paul 
> > To: Konrad Dybcio 
> > To: Abhinav Kumar 
> > To: Dmitry Baryshkov 
> > To: Marijn Suijten 
> > To: David Airlie 
> > To: Daniel Vetter 
> > To: Maarten Lankhorst 
> > To: Maxime Ripard 
> > To: Thomas Zimmermann 
> > To: Rob Herring 
> > To: Krzysztof Kozlowski 
> > To: Conor Dooley 
> > To: Bjorn Andersson 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: freedreno@lists.freedesktop.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > Puranam V G Tejaswi (3):
> >   drm/msm/a6xx: Add support for A663
> >   dt-bindings: display/msm/gmu: Add Adreno 663 GMU
> >   arm64: dts: qcom: sa8775p: Add gpu and gmu nodes
> > 
> >  .../devicetree/bindings/display/msm/gmu.yaml   |  1 +
> >  arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi |  8 +++
> >  arch/arm64/boot/dts/qcom/sa8775p.dtsi  | 75 
> > ++
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 19 ++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  8 ++-
> >  drivers/gpu/drm/msm/adreno/a6xx_hfi.c      | 33 ++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 ++
> >  7 files changed, 148 insertions(+), 1 deletion(-)
> > ---
> > base-commit: 15302579373ed2c8ada629e9e7bcf9569393a48d
> > change-id: 20240917-a663-gpu-support-b1475c828606
> > 
> > Best regards,
> > -- 
> > Akhil P Oommen 
> > 
> 
> -- 
> With best wishes
> Dmitry


Re: [PATCH 1/3] drm/msm/a6xx: Add support for A663

2024-09-23 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 06:51:50PM +0100, Connor Abbott wrote:
> On Tue, Sep 17, 2024 at 9:39 PM Akhil P Oommen  
> wrote:
> >
> > From: Puranam V G Tejaswi 
> >
> > Add support for Adreno 663 found on sa8775p based platforms.
> >
> > Signed-off-by: Puranam V G Tejaswi 
> > Signed-off-by: Akhil P Oommen 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  8 +++-
> >  drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 33 
> > +++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 +
> >  4 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > index 0312b6ee0356..8d8d0d7630f0 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > @@ -972,6 +972,25 @@ static const struct adreno_info a6xx_gpus[] = {
> > .prim_fifo_threshold = 0x00300200,
> > },
> > .address_space_size = SZ_16G,
> > +   }, {
> > +   .chip_ids = ADRENO_CHIP_IDS(0x06060300),
> > +   .family = ADRENO_6XX_GEN4,
> > +   .fw = {
> > +   [ADRENO_FW_SQE] = "a660_sqe.fw",
> > +   [ADRENO_FW_GMU] = "a663_gmu.bin",
> > +   },
> > +   .gmem = SZ_1M + SZ_512K,
> > +   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> > +   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> > +   ADRENO_QUIRK_HAS_HW_APRIV,
> > +   .init = a6xx_gpu_init,
> > +   .a6xx = &(const struct a6xx_info) {
> > +   .hwcg = a690_hwcg,
> > +   .protect = &a660_protect,
> > +   .gmu_cgc_mode = 0x00020200,
> > +   .prim_fifo_threshold = 0x00300200,
> > +   },
> > +   .address_space_size = SZ_16G,
> > }, {
> > .chip_ids = ADRENO_CHIP_IDS(0x06030500),
> > .family = ADRENO_6XX_GEN4,
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 06cab2c6fd66..e317780caeae 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -541,6 +541,12 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu 
> > *gpu)
> > gpu->ubwc_config.macrotile_mode = 1;
> > }
> >
> > +   if (adreno_is_a663(gpu)) {
> > +   gpu->ubwc_config.highest_bank_bit = 13;
> > +   gpu->ubwc_config.ubwc_swizzle = 0x4;
> 
> It's already been mentioned in the Mesa MR, but since this is the
> first GPU with level2_swizzling_dis set, the relevant vulkan CTS tests
> need to be tested with
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26578
> rebased on your Mesa enablement patch.

Will check this. But I prefer to not block DRM side patches on this.

-Akhil.

> 
> > +   gpu->ubwc_config.macrotile_mode = 1;
> > +   }
> > +
> > if (adreno_is_7c3(gpu)) {
> > gpu->ubwc_config.highest_bank_bit = 14;
> > gpu->ubwc_config.amsbc = 1;
> > @@ -1062,7 +1068,7 @@ static int hw_init(struct msm_gpu *gpu)
> > if (adreno_is_a690(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x90);
> > /* Set dualQ + disable afull for A660 GPU */
> > -   else if (adreno_is_a660(adreno_gpu))
> > +   else if (adreno_is_a660(adreno_gpu) || adreno_is_a663(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906);
> > else if (adreno_is_a7xx(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG,
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > index cdb3f6e74d3e..f1196d66055c 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > @@ -478,6 +478,37 @@ static void a660_build_bw_table(struct 
> > a6xx_hfi_msg_bw_table *msg)
> > msg->cnoc_cmds_data[1][0] =  0x6001;
> >  }
> >
> > +static void a663_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
> > +{
> > +   /*
> > +* Send a single &quo

Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create

2024-09-23 Thread Akhil P Oommen
On Fri, Sep 20, 2024 at 10:29:44AM -0700, Rob Clark wrote:
> On Fri, Sep 20, 2024 at 9:54 AM Akhil P Oommen  
> wrote:
> >
> > On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> > > Some userspace changes are necessary so add a flag for userspace to
> > > advertise support for preemption when creating the submitqueue.
> > >
> > > When this flag is not set preemption will not be allowed in the middle
> > > of the submitted IBs therefore mantaining compatibility with older
> > > userspace.
> > >
> > > The flag is rejected if preemption is not supported on the target, this
> > > allows userspace to know whether preemption is supported.
> >
> > Just curious, what is the motivation behind informing userspace about
> > preemption support?
> 
> I think I requested that, as a "just in case" (because it would
> otherwise be awkward if we later needed to know the difference btwn
> drm/sched "preemption" which can only happen before submit is written
> to ring and "real" preemption)

Thanks. That makes sense.

-Akhil

> 
> BR,
> -R
> 
> > -Akhil
> >
> > >
> > > Signed-off-by: Antonino Maniscalco 
> > > Tested-by: Neil Armstrong  # on SM8650-QRD
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 
> > >  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
> > >  include/uapi/drm/msm_drm.h|  5 -
> > >  3 files changed, 15 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 736f475d696f..edbcb6d229ba 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit)
> > >   OUT_PKT7(ring, CP_SET_MARKER, 1);
> > >   OUT_RING(ring, 0x101); /* IFPC disable */
> > >
> > > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > - OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > + OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > + }
> > >
> > >   /* Submit the commands */
> > >   for (i = 0; i < submit->nr_cmds; i++) {
> > > @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit)
> > >   update_shadow_rptr(gpu, ring);
> > >   }
> > >
> > > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > - OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > + OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > + }
> > >
> > >   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
> > >   rbmemptr_stats(ring, index, cpcycles_end));
> > > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
> > > b/drivers/gpu/drm/msm/msm_submitqueue.c
> > > index 0e803125a325..9b3ffca3f3b4 100644
> > > --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> > > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> > > @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, 
> > > struct msm_file_private *ctx,
> > >   if (!priv->gpu)
> > >   return -ENODEV;
> > >
> > > + if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 
> > > 1)
> > > + return -EINVAL;
> > > +
> > >   ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, 
> > > &sched_prio);
> > >   if (ret)
> > >   return ret;
> > > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> > > index 3fca72f73861..f37858db34e6 100644
> > > --- a/include/uapi/drm/msm_drm.h
> > > +++ b/include/uapi/drm/msm_drm.h
> > > @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
> > >   * backwards compatibility as a "default" submitqueue
> > >   */
> > >
> > > -#define MSM_SUBMITQUEUE_FLAGS (0)
> > > +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT0x0001
> > > +#define MSM_SUBMITQUEUE_FLAGS( \
> > > + MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> > > + 0)
> > >
> > >  /*
> > >   * The submitqueue priority should be between 0 and 
> > > MSM_PARAM_PRIORITIES-1,
> > >
> > > --
> > > 2.46.0
> > >


Re: [PATCH 1/3] drm/msm/a6xx: Add support for A663

2024-09-20 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 12:31:55AM +0300, Dmitry Baryshkov wrote:
> On Wed, Sep 18, 2024 at 02:08:41AM GMT, Akhil P Oommen wrote:
> > From: Puranam V G Tejaswi 
> > 
> > Add support for Adreno 663 found on sa8775p based platforms.
> > 
> > Signed-off-by: Puranam V G Tejaswi 
> > Signed-off-by: Akhil P Oommen 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  8 +++-
> >  drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 33 
> > +++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 +
> >  4 files changed, 64 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > index 0312b6ee0356..8d8d0d7630f0 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > @@ -972,6 +972,25 @@ static const struct adreno_info a6xx_gpus[] = {
> > .prim_fifo_threshold = 0x00300200,
> > },
> > .address_space_size = SZ_16G,
> > +   }, {
> > +   .chip_ids = ADRENO_CHIP_IDS(0x06060300),
> > +   .family = ADRENO_6XX_GEN4,
> > +   .fw = {
> > +   [ADRENO_FW_SQE] = "a660_sqe.fw",
> > +   [ADRENO_FW_GMU] = "a663_gmu.bin",
> > +   },
> > +   .gmem = SZ_1M + SZ_512K,
> > +   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> > +   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> > +   ADRENO_QUIRK_HAS_HW_APRIV,
> > +   .init = a6xx_gpu_init,
> > +   .a6xx = &(const struct a6xx_info) {
> > +   .hwcg = a690_hwcg,
> > +   .protect = &a660_protect,
> > +   .gmu_cgc_mode = 0x00020200,
> > +   .prim_fifo_threshold = 0x00300200,
> > +   },
> > +   .address_space_size = SZ_16G,
> > }, {
> > .chip_ids = ADRENO_CHIP_IDS(0x06030500),
> > .family = ADRENO_6XX_GEN4,
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 06cab2c6fd66..e317780caeae 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -541,6 +541,12 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu 
> > *gpu)
> > gpu->ubwc_config.macrotile_mode = 1;
> > }
> >  
> > +   if (adreno_is_a663(gpu)) {
> > +   gpu->ubwc_config.highest_bank_bit = 13;
> > +   gpu->ubwc_config.ubwc_swizzle = 0x4;
> > +   gpu->ubwc_config.macrotile_mode = 1;
> 
> If this looks like A660 / A690, shouldn't the driver also enable .amsbc,
> .rgb565_predicator and .uavflagprd_inv?

You are right! Will fix in next patchset.

-Akhil.

> 
> > +   }
> > +
> > if (adreno_is_7c3(gpu)) {
> > gpu->ubwc_config.highest_bank_bit = 14;
> > gpu->ubwc_config.amsbc = 1;
> > @@ -1062,7 +1068,7 @@ static int hw_init(struct msm_gpu *gpu)
> > if (adreno_is_a690(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x90);
> > /* Set dualQ + disable afull for A660 GPU */
> > -   else if (adreno_is_a660(adreno_gpu))
> > +   else if (adreno_is_a660(adreno_gpu) || adreno_is_a663(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906);
> > else if (adreno_is_a7xx(adreno_gpu))
> > gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG,
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > index cdb3f6e74d3e..f1196d66055c 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > @@ -478,6 +478,37 @@ static void a660_build_bw_table(struct 
> > a6xx_hfi_msg_bw_table *msg)
> > msg->cnoc_cmds_data[1][0] =  0x6001;
> >  }
> >  
> > +static void a663_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
> > +{
> > +   /*
> > +* Send a single "off" entry just to get things running
> > +* TODO: bus scaling
> > +*/
> > +   msg->bw_level_num = 1;
> > +
> > +   msg->ddr_cmds_num = 3;
> > +   msg->ddr_wait_bitmask = 0x07;
> > +
> > +   msg->ddr_cmds_addrs[0] = 0x50004;
> > +   msg->ddr_cmds_addrs[1] = 0x5;
> > +   msg->ddr_c

Re: [PATCH v4 00/11] Preemption support for A7XX

2024-09-20 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> Hi,
> 
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing 
> > latency
> > for high priority submissions.
> > 
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> > 
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > 
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> > 
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> > 
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> > 
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> > 
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> > 
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> > 
> > Signed-off-by: Antonino Maniscalco 
> > ---
> > Changes in v4:
> > - Added missing register in pwrup list
> > - Removed and rearrange barriers
> > - Renamed `skip_inline_wptr` to `restore_wptr`
> > - Track ctx seqno per ring
> > - Removed secure preempt context
> > - NOP out postamble to disable it instantly
> > - Only emit pwrup reglist once
> > - Document bv_rptr_addr
> > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > - Set name on preempt record buffer
> > - Link to v3: 
> > https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f...@gmail.com
> > 
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >anymore
> > - Link to v2: 
> > https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> > 
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: 
> > https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> > 
> > ---
> > Antonino Maniscalco (11):
> >drm/msm: Fix bv_fence being used as bv_rptr
> >drm/msm/A6XX: Track current_ctx_seqno per ring
> >drm/msm: Add a `preempt_record_size` field
> >drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >drm/msm/A6xx: Implement preemption for A7XX targets
> >drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >drm/msm/A6xx: Use posamble to reset counters on preemption
> >drm/msm/A6xx: Add traces for preemption
> >drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >drm/msm/A6xx: Enable preemption for A750
> >Documentation: document adreno preemption
> > 
> >   Documentation/gpu/msm-preemption.rst   |  98 +
> >   drivers/gpu/drm/msm/Makefile   |   1 +
> >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |   6 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |   7 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 325 ++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h  | 174 
> >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c  | 440 
> > +
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|   9 +-
> >   drivers/gpu/drm/msm/msm_drv.c  |   4 +
> >   drivers/gpu/drm/msm/msm_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/msm_gpu.h  |  11 -
> >   drivers/gpu/drm/msm/msm_gpu_trace.h|  28 ++
> >   drivers/gpu/drm/

Re: [PATCH v4 10/11] drm/msm/A6xx: Enable preemption for A750

2024-09-20 Thread Akhil P Oommen
On Tue, Sep 17, 2024 at 01:14:20PM +0200, Antonino Maniscalco wrote:
> Initialize with 4 rings to enable preemption.
> 
> For now only on A750 as other targets require testing.
> 
> Add the "preemption_enabled" module parameter to override this for other
> A7xx targets.
> 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 3 ++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   | 1 +
>  drivers/gpu/drm/msm/msm_drv.c | 4 
>  4 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> index 316f23ca9167..0e3041b29419 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -1240,7 +1240,8 @@ static const struct adreno_info a7xx_gpus[] = {
>   .gmem = 3 * SZ_1M,
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
>   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> -   ADRENO_QUIRK_HAS_HW_APRIV,
> +   ADRENO_QUIRK_HAS_HW_APRIV |
> +   ADRENO_QUIRK_PREEMPTION,
>   .init = a6xx_gpu_init,
>   .zapfw = "gen70900_zap.mbn",
>   .a6xx = &(const struct a6xx_info) {
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index edbcb6d229ba..4760f9469613 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2529,6 +2529,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>   struct a6xx_gpu *a6xx_gpu;
>   struct adreno_gpu *adreno_gpu;
>   struct msm_gpu *gpu;
> + extern int enable_preemption;
>   bool is_a7xx;
>   int ret;
>  
> @@ -2567,7 +2568,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>   return ERR_PTR(ret);
>   }
>  
> - if (is_a7xx)
> + if ((enable_preemption == 1) || (enable_preemption == -1 &&
> + (config->info->quirks & ADRENO_QUIRK_PREEMPTION)))
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4);
> + else if (is_a7xx)
>   ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
>   else if (adreno_has_gmu_wrapper(adreno_gpu))
>   ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 
> 1);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 87098567483b..d1cd53f05de6 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -56,6 +56,7 @@ enum adreno_family {
>  #define ADRENO_QUIRK_LMLOADKILL_DISABLE  BIT(2)
>  #define ADRENO_QUIRK_HAS_HW_APRIVBIT(3)
>  #define ADRENO_QUIRK_HAS_CACHED_COHERENT BIT(4)
> +#define ADRENO_QUIRK_PREEMPTION  BIT(5)
>  
>  /* Helper for formating the chip_id in the way that userspace tools like
>   * crashdec expect.
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 9c33f4e3f822..7c64b20f5e3b 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -58,6 +58,10 @@ static bool modeset = true;
>  MODULE_PARM_DESC(modeset, "Use kernel modesetting [KMS] (1=on (default), 
> 0=disable)");
>  module_param(modeset, bool, 0600);
>  
> +int enable_preemption = -1;
> +MODULE_PARM_DESC(enable_preemption, "Enable preemption (A7xx only) (1=on , 
> 0=disable, -1=auto (default))");
> +module_param(enable_preemption, int, 0600);
> +

Is adreno_device.c a better place for adreno specific module params?

-Akhil.

>  #ifdef CONFIG_FAULT_INJECTION
>  DECLARE_FAULT_ATTR(fail_gem_alloc);
>  DECLARE_FAULT_ATTR(fail_gem_iova);
> 
> -- 
> 2.46.0
> 


Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create

2024-09-20 Thread Akhil P Oommen
On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> Some userspace changes are necessary so add a flag for userspace to
> advertise support for preemption when creating the submitqueue.
> 
> When this flag is not set preemption will not be allowed in the middle
> of the submitted IBs therefore mantaining compatibility with older
> userspace.
> 
> The flag is rejected if preemption is not supported on the target, this
> allows userspace to know whether preemption is supported.

Just curious, what is the motivation behind informing userspace about
preemption support?

-Akhil

> 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 
>  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
>  include/uapi/drm/msm_drm.h|  5 -
>  3 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 736f475d696f..edbcb6d229ba 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   OUT_PKT7(ring, CP_SET_MARKER, 1);
>   OUT_RING(ring, 0x101); /* IFPC disable */
>  
> - OUT_PKT7(ring, CP_SET_MARKER, 1);
> - OUT_RING(ring, 0x00d); /* IB1LIST start */
> + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> + OUT_PKT7(ring, CP_SET_MARKER, 1);
> + OUT_RING(ring, 0x00d); /* IB1LIST start */
> + }
>  
>   /* Submit the commands */
>   for (i = 0; i < submit->nr_cmds; i++) {
> @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   update_shadow_rptr(gpu, ring);
>   }
>  
> - OUT_PKT7(ring, CP_SET_MARKER, 1);
> - OUT_RING(ring, 0x00e); /* IB1LIST end */
> + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> + OUT_PKT7(ring, CP_SET_MARKER, 1);
> + OUT_RING(ring, 0x00e); /* IB1LIST end */
> + }
>  
>   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
>   rbmemptr_stats(ring, index, cpcycles_end));
> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
> b/drivers/gpu/drm/msm/msm_submitqueue.c
> index 0e803125a325..9b3ffca3f3b4 100644
> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct 
> msm_file_private *ctx,
>   if (!priv->gpu)
>   return -ENODEV;
>  
> + if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
> + return -EINVAL;
> +
>   ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, &sched_prio);
>   if (ret)
>   return ret;
> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> index 3fca72f73861..f37858db34e6 100644
> --- a/include/uapi/drm/msm_drm.h
> +++ b/include/uapi/drm/msm_drm.h
> @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
>   * backwards compatibility as a "default" submitqueue
>   */
>  
> -#define MSM_SUBMITQUEUE_FLAGS (0)
> +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT0x0001
> +#define MSM_SUBMITQUEUE_FLAGS( \
> + MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> + 0)
>  
>  /*
>   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> 
> -- 
> 2.46.0
> 


Re: [PATCH v4 08/11] drm/msm/A6xx: Add traces for preemption

2024-09-20 Thread Akhil P Oommen
On Tue, Sep 17, 2024 at 01:14:18PM +0200, Antonino Maniscalco wrote:
> Add trace points corresponding to preemption being triggered and being
> completed for latency measurement purposes.
> 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c |  6 ++
>  drivers/gpu/drm/msm/msm_gpu_trace.h   | 28 
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index 77c4d5e91854..4fbc66d6860a 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -7,6 +7,7 @@
>  #include "a6xx_gpu.h"
>  #include "a6xx_gmu.xml.h"
>  #include "msm_mmu.h"
> +#include "msm_gpu_trace.h"
>  
>  /*
>   * Try to transition the preemption state from old to new. Return
> @@ -174,6 +175,8 @@ void a6xx_preempt_irq(struct msm_gpu *gpu)
>  
>   set_preempt_state(a6xx_gpu, PREEMPT_NONE);
>  
> + trace_msm_gpu_preemption_irq(a6xx_gpu->cur_ring->id);
> +
>   /*
>* Retrigger preemption to avoid a deadlock that might occur when 
> preemption
>* is skipped due to it being already in flight when requested.
> @@ -295,6 +298,9 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>*/
>   ring->restore_wptr = false;
>  
> + trace_msm_gpu_preemption_trigger(a6xx_gpu->cur_ring->id,
> + ring ? ring->id : -1);
> +
>   spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  
>   gpu_write64(gpu,
> diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
> b/drivers/gpu/drm/msm/msm_gpu_trace.h
> index ac40d857bc45..7f863282db0d 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_trace.h
> +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
> @@ -177,6 +177,34 @@ TRACE_EVENT(msm_gpu_resume,
>   TP_printk("%u", __entry->dummy)
>  );
>  
> +TRACE_EVENT(msm_gpu_preemption_trigger,
> + TP_PROTO(int ring_id_from, int ring_id_to),
> + TP_ARGS(ring_id_from, ring_id_to),
> + TP_STRUCT__entry(
> + __field(int, ring_id_from)
> + __field(int, ring_id_to)
> + ),
> + TP_fast_assign(
> + __entry->ring_id_from = ring_id_from;
> + __entry->ring_id_to = ring_id_to;
> + ),
> + TP_printk("preempting %u -> %u",
> +   __entry->ring_id_from,
> +   __entry->ring_id_to)
> +);
> +
> +TRACE_EVENT(msm_gpu_preemption_irq,
> + TP_PROTO(u32 ring_id),
> + TP_ARGS(ring_id),
> + TP_STRUCT__entry(
> + __field(u32, ring_id)
> + ),
> + TP_fast_assign(
> + __entry->ring_id = ring_id;
> + ),
> + TP_printk("preempted to %u", __entry->ring_id)
> +);
> +
>  #endif
>  
>  #undef TRACE_INCLUDE_PATH
> 
> -- 
> 2.46.0
> 


Re: [PATCH v4 07/11] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-09-20 Thread Akhil P Oommen
On Tue, Sep 17, 2024 at 01:14:17PM +0200, Antonino Maniscalco wrote:
> Use the postamble to reset perf counters when switching between rings,
> except when sysprof is enabled, analogously to how they are reset
> between submissions when switching pagetables.
> 
> Signed-off-by: Antonino Maniscalco 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  6 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 57 
> +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 ++--
>  4 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 355a3e210335..736f475d696f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -358,6 +358,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>  static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
>   struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
>  {
> + u64 preempt_postamble;
> +
>   OUT_PKT7(ring, CP_SET_PSEUDO_REG, 12);
>  
>   OUT_RING(ring, SMMU_INFO);
> @@ -381,6 +383,16 @@ static void a6xx_emit_set_pseudo_reg(struct 
> msm_ringbuffer *ring,
>   /* seems OK to set to 0 to disable it */
>   OUT_RING(ring, 0);
>   OUT_RING(ring, 0);
> +
> + /* Emit postamble to clear perfcounters */
> + preempt_postamble = a6xx_gpu->preempt_postamble_iova;
> +
> + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> + OUT_RING(ring, lower_32_bits(preempt_postamble));
> + OUT_RING(ring, upper_32_bits(preempt_postamble));
> + OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> +  a6xx_gpu->preempt_postamble_len) |
> +  CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
>  }
>  
>  static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 7fc994121676..ae13892c87e3 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -71,6 +71,12 @@ struct a6xx_gpu {
>   bool uses_gmem;
>   bool skip_save_restore;
>  
> + struct drm_gem_object *preempt_postamble_bo;
> + void *preempt_postamble_ptr;
> + uint64_t preempt_postamble_iova;
> + uint64_t preempt_postamble_len;
> + bool postamble_enabled;
> +
>   struct a6xx_gmu gmu;
>  
>   struct drm_gem_object *shadow_bo;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index aa4bad394f9e..77c4d5e91854 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -97,6 +97,43 @@ static void a6xx_preempt_timer(struct timer_list *t)
>   kthread_queue_work(gpu->worker, &gpu->recover_work);
>  }
>  
> +static void preempt_prepare_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> + u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
> + u32 count = 0;
> +
> + postamble[count++] = PKT7(CP_REG_RMW, 3);
> + postamble[count++] = REG_A6XX_RBBM_PERFCTR_SRAM_INIT_CMD;
> + postamble[count++] = 0;
> + postamble[count++] = 1;
> +
> + postamble[count++] = PKT7(CP_WAIT_REG_MEM, 6);
> + postamble[count++] = CP_WAIT_REG_MEM_0_FUNCTION(WRITE_EQ);
> + postamble[count++] = CP_WAIT_REG_MEM_1_POLL_ADDR_LO(
> + REG_A6XX_RBBM_PERFCTR_SRAM_INIT_STATUS);
> + postamble[count++] = CP_WAIT_REG_MEM_2_POLL_ADDR_HI(0);
> + postamble[count++] = CP_WAIT_REG_MEM_3_REF(0x1);
> + postamble[count++] = CP_WAIT_REG_MEM_4_MASK(0x1);
> + postamble[count++] = CP_WAIT_REG_MEM_5_DELAY_LOOP_CYCLES(0);
> +
> + a6xx_gpu->preempt_postamble_len = count;
> +
> + a6xx_gpu->postamble_enabled = true;
> +}
> +
> +static void preempt_disable_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> + u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
> +
> + /*
> +  * Disable the postamble by replacing the first packet header with a NOP
> +  * that covers the whole buffer.
> +  */
> + *postamble = PKT7(CP_NOP, (a6xx_gpu->preempt_postamble_len - 1));
> +
> + a6xx_gpu->postamble_enabled = false;
> +}
> +
>  void a6xx_preempt_irq(struct msm_gpu *gpu)
>  {
>   uint32_t status;
> @@ -187,6 +224,7 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>   unsigned long flags;
>   struct msm_ringbuffer *ring;
>   unsigned int cntl;

Re: [PATCH v4 05/11] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-20 Thread Akhil P Oommen
 unsigned long flags;
> + struct msm_ringbuffer *ring;
> + unsigned int cntl;
> +
> + if (gpu->nr_rings == 1)
> + return;
> +
> + /*
> +  * Lock to make sure another thread attempting preemption doesn't skip 
> it
> +  * while we are still evaluating the next ring. This makes sure the 
> other
> +  * thread does start preemption if we abort it and avoids a soft lock.
> +  */
> + spin_lock_irqsave(&a6xx_gpu->eval_lock, flags);
> +
> + /*
> +  * Try to start preemption by moving from NONE to START. If
> +  * unsuccessful, a preemption is already in flight
> +  */
> + if (!try_preempt_state(a6xx_gpu, PREEMPT_NONE, PREEMPT_START)) {
> + spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> + return;
> + }
> +
> + cntl = A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL(a6xx_gpu->preempt_level);
> +
> + if (a6xx_gpu->skip_save_restore)
> + cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_SKIP_SAVE_RESTORE;
> +
> + if (a6xx_gpu->uses_gmem)
> + cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_USES_GMEM;
> +
> + cntl |= A6XX_CP_CONTEXT_SWITCH_CNTL_STOP;
> +
> + /* Get the next ring to preempt to */
> + ring = get_next_ring(gpu);
> +
> + /*
> +  * If no ring is populated or the highest priority ring is the current
> +  * one do nothing except to update the wptr to the latest and greatest
> +  */
> + if (!ring || (a6xx_gpu->cur_ring == ring)) {
> + set_preempt_state(a6xx_gpu, PREEMPT_FINISH);
> + update_wptr(gpu, a6xx_gpu->cur_ring);
> + set_preempt_state(a6xx_gpu, PREEMPT_NONE);
> + spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> + return;
> + }
> +
> + spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> +
> + spin_lock_irqsave(&ring->preempt_lock, flags);
> +
> + struct a7xx_cp_smmu_info *smmu_info_ptr =
> + a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_SMMU_INFO;
> + struct a6xx_preempt_record *record_ptr =
> + a6xx_gpu->preempt[ring->id] + PREEMPT_OFFSET_PRIV_NON_SECURE;
> + u64 ttbr0 = ring->memptrs->ttbr0;
> + u32 context_idr = ring->memptrs->context_idr;
> +
> + smmu_info_ptr->ttbr0 = ttbr0;
> + smmu_info_ptr->context_idr = context_idr;
> + record_ptr->wptr = get_wptr(ring);
> +
> + /*
> +  * The GPU will write the wptr we set above when we preempt. Reset
> +  * restore_wptr to make sure that we don't write WPTR to the same
> +  * thing twice. It's still possible subsequent submissions will update
> +  * wptr again, in which case they will set the flag to true. This has
> +  * to be protected by the lock for setting the flag and updating wptr
> +  * to be atomic.
> +  */
> + ring->restore_wptr = false;
> +
> + spin_unlock_irqrestore(&ring->preempt_lock, flags);
> +
> + gpu_write64(gpu,
> + REG_A6XX_CP_CONTEXT_SWITCH_SMMU_INFO,
> + a6xx_gpu->preempt_iova[ring->id] + PREEMPT_OFFSET_SMMU_INFO);
> +
> + gpu_write64(gpu,
> + REG_A6XX_CP_CONTEXT_SWITCH_PRIV_NON_SECURE_RESTORE_ADDR,
> + a6xx_gpu->preempt_iova[ring->id] + 
> PREEMPT_OFFSET_PRIV_NON_SECURE);
> +
> + a6xx_gpu->next_ring = ring;
> +
> + /* Start a timer to catch a stuck preemption */
> + mod_timer(&a6xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(1));
> +
> + /* Set the preemption state to triggered */
> + set_preempt_state(a6xx_gpu, PREEMPT_TRIGGERED);
> +
> + /* Trigger the preemption */
> + gpu_write(gpu, REG_A6XX_CP_CONTEXT_SWITCH_CNTL, cntl);
> +}
> +
> +static int preempt_init_ring(struct a6xx_gpu *a6xx_gpu,
> + struct msm_ringbuffer *ring)
> +{
> + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> + struct msm_gpu *gpu = &adreno_gpu->base;
> + struct drm_gem_object *bo = NULL;
> + phys_addr_t ttbr;
> + u64 iova = 0;
> + void *ptr;
> + int asid;
> +
> + ptr = msm_gem_kernel_new(gpu->dev,
> + PREEMPT_SIZE(adreno_gpu->info->preempt_record_size),
> + MSM_BO_WC | MSM_BO_MAP_PRIV, gpu->aspace, &bo, &iova);
> +
> + if (IS_ERR(ptr))
> + return PTR_ERR(ptr);
> +
> + memset(ptr, 0, PREEMPT_SIZE(adreno_gpu->info->preempt_record_size));
> +
> + msm_gem_object_set_name(bo, "preempt_record");

I wish we could add ring id too. Anyway

Reviewed-by: Akhil P Oommen 

-Akhil

> +
> + a6xx_gpu->preempt_bo[ring->id] = bo;
> + a6xx_gpu->preempt_iova[ring->id] = iova;
> + a6xx_gpu->preempt[ring->id] = ptr;
> +
> + struct a7xx_cp_smmu_info *smmu_info_ptr = ptr + 
> PREEMPT_OFFSET_SMMU_INFO;
> + struct a6xx_preempt_record *record_ptr = ptr + 
> PREEMPT_OFFSET_PRIV_NON_SECURE;
> +
> + msm_iommu_pagetable_params(gpu->aspace->mmu, &ttbr, &asid);
> +
> + smmu_info_ptr->magic = GEN7_CP_SMMU_INFO_MAGIC;
> + smmu_info_ptr->ttbr0 = ttbr;
> + smmu_info_ptr->asid = 0xdecafbad;
> + smmu_info_ptr->context_idr = 0;
> +
> + /* Set up the defaults on the preemption record */
> + record_ptr->magic = A6XX_PREEMPT_RECORD_MAGIC;
> + record_ptr->info = 0;
> + record_ptr->data = 0;
> + record_ptr->rptr = 0;
> + record_ptr->wptr = 0;
> + record_ptr->cntl = MSM_GPU_RB_CNTL_DEFAULT;
> + record_ptr->rbase = ring->iova;
> + record_ptr->counter = 0;
> + record_ptr->bv_rptr_addr = rbmemptr(ring, bv_rptr);
> +
> + return 0;
> +}
> +
> +void a6xx_preempt_fini(struct msm_gpu *gpu)
> +{
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> + int i;
> +
> + for (i = 0; i < gpu->nr_rings; i++)
> + msm_gem_kernel_put(a6xx_gpu->preempt_bo[i], gpu->aspace);
> +}
> +
> +void a6xx_preempt_init(struct msm_gpu *gpu)
> +{
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> + int i;
> +
> + /* No preemption if we only have one ring */
> + if (gpu->nr_rings <= 1)
> + return;
> +
> + for (i = 0; i < gpu->nr_rings; i++) {
> + if (preempt_init_ring(a6xx_gpu, gpu->rb[i]))
> + goto fail;
> + }
> +
> + /* TODO: make this configurable? */
> + a6xx_gpu->preempt_level = 1;
> + a6xx_gpu->uses_gmem = 1;
> + a6xx_gpu->skip_save_restore = 1;
> +
> + timer_setup(&a6xx_gpu->preempt_timer, a6xx_preempt_timer, 0);
> +
> + return;
> +fail:
> + /*
> +  * On any failure our adventure is over. Clean up and
> +  * set nr_rings to 1 to force preemption off
> +  */
> + a6xx_preempt_fini(gpu);
> + gpu->nr_rings = 1;
> +
> + DRM_DEV_ERROR(&gpu->pdev->dev,
> +   "preemption init failed, disabling 
> preemption\n");
> +
> + return;
> +}
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
> b/drivers/gpu/drm/msm/msm_ringbuffer.h
> index 174f83137a49..d1e49f701c81 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.h
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
> @@ -36,6 +36,7 @@ struct msm_rbmemptrs {
>  
>   volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
>   volatile u64 ttbr0;
> + volatile u32 context_idr;
>  };
>  
>  struct msm_cp_state {
> @@ -101,6 +102,12 @@ struct msm_ringbuffer {
>*/
>   spinlock_t preempt_lock;
>  
> + /*
> +  * Whether we skipped writing wptr and it needs to be updated in the
> +  * future when the ring becomes current.
> +  */
> + bool restore_wptr;
> +
>   /**
>* cur_ctx_seqno:
>*
> 
> -- 
> 2.46.0
> 


Re: [PATCH v4 00/11] Preemption support for A7XX

2024-09-20 Thread Akhil P Oommen
On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
>  wrote:
> >
> > Hi,
> >
> > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > This series implements preemption for A7XX targets, which allows the GPU 
> > > to
> > > switch to an higher priority ring when work is pushed to it, reducing 
> > > latency
> > > for high priority submissions.
> > >
> > > This series enables L1 preemption with skip_save_restore which requires
> > > the following userspace patches to function:
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > >
> > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > from compatible userspace to be preempted, therefore maintaining
> > > compatibility.
> > >
> > > Preemption is currently only enabled by default on A750, it can be
> > > enabled on other targets through the `enable_preemption` module
> > > parameter. This is because more testing is required on other targets.
> > >
> > > For testing on other HW it is sufficient to set that parameter to a
> > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > allows to run any application as high priority therefore preempting
> > > submissions from other applications.
> > >
> > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > added in this series can be used to observe preemption's behavior as
> > > well as measuring preemption latency.
> > >
> > > Some commits from this series are based on a previous series to enable
> > > preemption on A6XX targets:
> > >
> > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> > >
> > > Signed-off-by: Antonino Maniscalco 
> > > ---
> > > Changes in v4:
> > > - Added missing register in pwrup list
> > > - Removed and rearrange barriers
> > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > - Track ctx seqno per ring
> > > - Removed secure preempt context
> > > - NOP out postamble to disable it instantly
> > > - Only emit pwrup reglist once
> > > - Document bv_rptr_addr
> > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > - Set name on preempt record buffer
> > > - Link to v3: 
> > > https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f...@gmail.com
> > >
> > > Changes in v3:
> > > - Added documentation about preemption
> > > - Use quirks to determine which target supports preemption
> > > - Add a module parameter to force disabling or enabling preemption
> > > - Clear postamble when profiling
> > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > - Make preemption records MAP_PRIV
> > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > >anymore
> > > - Link to v2: 
> > > https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> > >
> > > Changes in v2:
> > > - Added preept_record_size for X185 in PATCH 3/7
> > > - Added patches to reset perf counters
> > > - Dropped unused defines
> > > - Dropped unused variable (fixes warning)
> > > - Only enable preemption on a750
> > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > - Added Neil's Tested-By tags
> > > - Added explanation for UAPI changes in commit message
> > > - Link to v1: 
> > > https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> > >
> > > ---
> > > Antonino Maniscalco (11):
> > >drm/msm: Fix bv_fence being used as bv_rptr
> > >drm/msm/A6XX: Track current_ctx_seqno per ring
> > >drm/msm: Add a `preempt_record_size` field
> > >drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > >drm/msm/A6xx: Implement preemption for A7XX targets
> > >drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > >drm/msm/A6xx: Use posamble to reset counters on preemption
> > >drm/msm/A6xx: Add traces for preemption
> > >drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > >drm/msm/A6xx: Enable preemption for A750
> > >Documentation: document adreno preemption
> > >
> > >   Documentation/gpu/msm-preemption.rst   |  98 +
> > >   drivers/gpu/drm/msm/Makefile   |   1 +
> > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c  |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |   6 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |   7 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 325 ++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h  | 174 
> > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c  | 440 
> > > +
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|   9 +-
> > >   drivers/gpu/drm/msm/

[PATCH 3/3] arm64: dts: qcom: sa8775p: Add gpu and gmu nodes

2024-09-17 Thread Akhil P Oommen
From: Puranam V G Tejaswi 

Add gpu and gmu nodes for sa8775p based platforms.

Signed-off-by: Puranam V G Tejaswi 
Signed-off-by: Akhil P Oommen 
---
 arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi |  8 
 arch/arm64/boot/dts/qcom/sa8775p.dtsi  | 75 ++
 2 files changed, 83 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi 
b/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
index 2a6170623ea9..a01e6675c4bb 100644
--- a/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
+++ b/arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi
@@ -407,6 +407,14 @@ queue3 {
};
 };
 
+&gpu {
+   status = "okay";
+
+   zap-shader {
+   firmware-name = "qcom/sa8775p/a663_zap.mbn";
+   };
+};
+
 &i2c11 {
clock-frequency = <40>;
pinctrl-0 = <&qup_i2c11_default>;
diff --git a/arch/arm64/boot/dts/qcom/sa8775p.dtsi 
b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
index 23f1b2e5e624..12c79135a303 100644
--- a/arch/arm64/boot/dts/qcom/sa8775p.dtsi
+++ b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
@@ -2824,6 +2824,81 @@ tcsr_mutex: hwlock@1f4 {
#hwlock-cells = <1>;
};
 
+   gpu: gpu@3d0 {
+   compatible = "qcom,adreno-663.0", "qcom,adreno";
+   reg = <0 0x03d0 0 0x4>,
+ <0 0x03d9e000 0 0x1000>,
+ <0 0x03d61000 0 0x800>;
+   reg-names = "kgsl_3d0_reg_memory",
+   "cx_mem",
+   "cx_dbgc";
+   interrupts = ;
+   iommus = <&adreno_smmu 0 0xc00>,
+<&adreno_smmu 1 0xc00>;
+   operating-points-v2 = <&gpu_opp_table>;
+   qcom,gmu = <&gmu>;
+   interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt 
SLAVE_EBI1 0>;
+   interconnect-names = "gfx-mem";
+   #cooling-cells = <2>;
+
+   status = "disabled";
+
+   zap-shader {
+   memory-region = <&pil_gpu_mem>;
+   };
+
+   gpu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-40500 {
+   opp-hz = /bits/ 64 <40500>;
+   opp-level = 
;
+   opp-peak-kBps = <8368000>;
+   };
+
+   };
+   };
+
+   gmu: gmu@3d6a000 {
+   compatible = "qcom,adreno-gmu-663.0", "qcom,adreno-gmu";
+   reg = <0 0x03d6a000 0 0x34000>,
+   <0 0x3de 0 0x1>,
+   <0 0x0b29 0 0x1>;
+   reg-names = "gmu", "rscc", "gmu_pdc";
+   interrupts = ,
+   ;
+   interrupt-names = "hfi", "gmu";
+   clocks = <&gpucc GPU_CC_CX_GMU_CLK>,
+<&gpucc GPU_CC_CXO_CLK>,
+<&gcc GCC_DDRSS_GPU_AXI_CLK>,
+<&gcc GCC_GPU_MEMNOC_GFX_CLK>,
+<&gpucc GPU_CC_AHB_CLK>,
+<&gpucc GPU_CC_HUB_CX_INT_CLK>,
+<&gpucc GPU_CC_HLOS1_VOTE_GPU_SMMU_CLK>;
+   clock-names = "gmu",
+ "cxo",
+ "axi",
+ "memnoc",
+ "ahb",
+ "hub",
+ "smmu_vote";
+   power-domains = <&gpucc GPU_CC_CX_GDSC>,
+   <&gpucc GPU_CC_GX_GDSC>;
+   power-domain-names = "cx",
+"gx";
+   iommus = <&adreno_smmu 5 0xc00>;
+   operating-points-v2 = <&gmu_opp_table>;
+
+   gmu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-2 {
+   opp-hz = /bits/ 64 <2>;
+   opp-level = 
;
+   };
+   };
+   };
+
gpucc: clock-controller@3d9 {
compatible = "qcom,sa8775p-gpucc";
reg = <0x0 0x03d9 0x0 0xa000>;

-- 
2.45.2



[PATCH 2/3] dt-bindings: display/msm/gmu: Add Adreno 663 GMU

2024-09-17 Thread Akhil P Oommen
From: Puranam V G Tejaswi 

Document Adreno 663 GMU in the dt-binding specification.

Signed-off-by: Puranam V G Tejaswi 
Signed-off-by: Akhil P Oommen 
---
 Documentation/devicetree/bindings/display/msm/gmu.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index b1bd372996d5..ab884e236429 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -125,6 +125,7 @@ allOf:
 enum:
   - qcom,adreno-gmu-635.0
   - qcom,adreno-gmu-660.1
+  - qcom,adreno-gmu-663.0
 then:
   properties:
 reg:

-- 
2.45.2



[PATCH 1/3] drm/msm/a6xx: Add support for A663

2024-09-17 Thread Akhil P Oommen
From: Puranam V G Tejaswi 

Add support for Adreno 663 found on sa8775p based platforms.

Signed-off-by: Puranam V G Tejaswi 
Signed-off-by: Akhil P Oommen 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  8 +++-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 33 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 +
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 0312b6ee0356..8d8d0d7630f0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -972,6 +972,25 @@ static const struct adreno_info a6xx_gpus[] = {
.prim_fifo_threshold = 0x00300200,
},
.address_space_size = SZ_16G,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x06060300),
+   .family = ADRENO_6XX_GEN4,
+   .fw = {
+   [ADRENO_FW_SQE] = "a660_sqe.fw",
+   [ADRENO_FW_GMU] = "a663_gmu.bin",
+   },
+   .gmem = SZ_1M + SZ_512K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
+   ADRENO_QUIRK_HAS_HW_APRIV,
+   .init = a6xx_gpu_init,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a690_hwcg,
+   .protect = &a660_protect,
+   .gmu_cgc_mode = 0x00020200,
+   .prim_fifo_threshold = 0x00300200,
+   },
+   .address_space_size = SZ_16G,
}, {
.chip_ids = ADRENO_CHIP_IDS(0x06030500),
.family = ADRENO_6XX_GEN4,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 06cab2c6fd66..e317780caeae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -541,6 +541,12 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
gpu->ubwc_config.macrotile_mode = 1;
}
 
+   if (adreno_is_a663(gpu)) {
+   gpu->ubwc_config.highest_bank_bit = 13;
+   gpu->ubwc_config.ubwc_swizzle = 0x4;
+   gpu->ubwc_config.macrotile_mode = 1;
+   }
+
if (adreno_is_7c3(gpu)) {
gpu->ubwc_config.highest_bank_bit = 14;
gpu->ubwc_config.amsbc = 1;
@@ -1062,7 +1068,7 @@ static int hw_init(struct msm_gpu *gpu)
if (adreno_is_a690(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x90);
/* Set dualQ + disable afull for A660 GPU */
-   else if (adreno_is_a660(adreno_gpu))
+   else if (adreno_is_a660(adreno_gpu) || adreno_is_a663(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG, 0x66906);
else if (adreno_is_a7xx(adreno_gpu))
gpu_write(gpu, REG_A6XX_UCHE_CMDQ_CONFIG,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c 
b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
index cdb3f6e74d3e..f1196d66055c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
@@ -478,6 +478,37 @@ static void a660_build_bw_table(struct 
a6xx_hfi_msg_bw_table *msg)
msg->cnoc_cmds_data[1][0] =  0x6001;
 }
 
+static void a663_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
+{
+   /*
+* Send a single "off" entry just to get things running
+* TODO: bus scaling
+*/
+   msg->bw_level_num = 1;
+
+   msg->ddr_cmds_num = 3;
+   msg->ddr_wait_bitmask = 0x07;
+
+   msg->ddr_cmds_addrs[0] = 0x50004;
+   msg->ddr_cmds_addrs[1] = 0x5;
+   msg->ddr_cmds_addrs[2] = 0x500b4;
+
+   msg->ddr_cmds_data[0][0] =  0x4000;
+   msg->ddr_cmds_data[0][1] =  0x4000;
+   msg->ddr_cmds_data[0][2] =  0x4000;
+
+   /*
+* These are the CX (CNOC) votes - these are used by the GMU but the
+* votes are known and fixed for the target
+*/
+   msg->cnoc_cmds_num = 1;
+   msg->cnoc_wait_bitmask = 0x01;
+
+   msg->cnoc_cmds_addrs[0] = 0x50058;
+   msg->cnoc_cmds_data[0][0] =  0x4000;
+   msg->cnoc_cmds_data[1][0] =  0x6001;
+}
+
 static void adreno_7c3_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
 {
/*
@@ -646,6 +677,8 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
adreno_7c3_build_bw_table(&msg);
else if (adreno_is_a660(adreno_gpu))
a660_build_bw_table(&msg);
+   else if (adreno_is_a663(adreno_gpu))
+   a663_build_bw_table(&msg);
else if (adreno_is_a690(adreno_gpu))
a690_build_bw_table(&msg);
else if (adreno_is_a7

[PATCH 0/3] DRM/MSM: Support for Adreno 663 GPU

2024-09-17 Thread Akhil P Oommen
This series adds support for Adreno 663 gpu found in SA8775P chipsets.
The closest gpu which is currently supported in drm-msm is A660.
Following are the major differences with that:
1. gmu/zap firmwares
2. Recommended to disable Level2 swizzling

Verified kmscube with the below Mesa change [1]. This series is rebased
on top of msm-next.

Patch (1) & (2) for Rob Clark and Patch (3) for Bjorn

[0] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31211

To: Rob Clark 
To: Sean Paul 
To: Konrad Dybcio 
To: Abhinav Kumar 
To: Dmitry Baryshkov 
To: Marijn Suijten 
To: David Airlie 
To: Daniel Vetter 
To: Maarten Lankhorst 
To: Maxime Ripard 
To: Thomas Zimmermann 
To: Rob Herring 
To: Krzysztof Kozlowski 
To: Conor Dooley 
To: Bjorn Andersson 
Cc: linux-arm-...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-ker...@vger.kernel.org
Cc: devicet...@vger.kernel.org

Signed-off-by: Akhil P Oommen 
---
Puranam V G Tejaswi (3):
  drm/msm/a6xx: Add support for A663
  dt-bindings: display/msm/gmu: Add Adreno 663 GMU
  arm64: dts: qcom: sa8775p: Add gpu and gmu nodes

 .../devicetree/bindings/display/msm/gmu.yaml   |  1 +
 arch/arm64/boot/dts/qcom/sa8775p-ride.dtsi |  8 +++
 arch/arm64/boot/dts/qcom/sa8775p.dtsi  | 75 ++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 19 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  8 ++-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c  | 33 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 ++
 7 files changed, 148 insertions(+), 1 deletion(-)
---
base-commit: 15302579373ed2c8ada629e9e7bcf9569393a48d
change-id: 20240917-a663-gpu-support-b1475c828606

Best regards,
-- 
Akhil P Oommen 



Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

2024-09-17 Thread Akhil P Oommen
On Tue, Sep 17, 2024 at 03:47:09PM +0200, Konrad Dybcio wrote:
> On 13.09.2024 9:51 PM, Rob Clark wrote:
> > From: Rob Clark 
> > 
> > The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> > devices (x1-85, possibly others), it seems to pass that barrier while
> > there are still things in the event completion FIFO waiting to be
> > written back to memory.
> 
> Can we try to force-fault around here on other GPUs and perhaps
> limit this workaround?
> 
> Akhil, do we have any insight on this?

Nothing at the moment. I will check this further.

-Akhil.

> 
> Konrad


Re: [PATCH v3 04/10] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-16 Thread Akhil P Oommen
On Thu, Sep 12, 2024 at 05:48:45PM +0200, Antonino Maniscalco wrote:
> On 9/10/24 6:43 PM, Akhil P Oommen wrote:
> > On Mon, Sep 09, 2024 at 01:22:22PM +0100, Connor Abbott wrote:
> > > On Fri, Sep 6, 2024 at 9:03 PM Akhil P Oommen  
> > > wrote:
> > > > 
> > > > On Thu, Sep 05, 2024 at 04:51:22PM +0200, Antonino Maniscalco wrote:
> > > > > This patch implements preemption feature for A6xx targets, this allows
> > > > > the GPU to switch to a higher priority ringbuffer if one is ready. 
> > > > > A6XX
> > > > > hardware as such supports multiple levels of preemption granularities,
> > > > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > > > such as draw-call level or a bin boundary level preemption. This patch
> > > > > enables the basic preemption level, with more fine grained preemption
> > > > > support to follow.
> > > > > 
> > > > > Signed-off-by: Sharat Masetty 
> > > > > Signed-off-by: Antonino Maniscalco 
> > > > > Tested-by: Neil Armstrong  # on SM8650-QRD
> > > > > ---
> > > > >   drivers/gpu/drm/msm/Makefile  |   1 +
> > > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 293 
> > > > > +-
> > > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 161 
> > > > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 391 
> > > > > ++
> > > > >   drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > > > >   5 files changed, 844 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/msm/Makefile 
> > > > > b/drivers/gpu/drm/msm/Makefile
> > > > > index f5e2838c6a76..32e915109a59 100644
> > > > > --- a/drivers/gpu/drm/msm/Makefile
> > > > > +++ b/drivers/gpu/drm/msm/Makefile
> > > > > @@ -23,6 +23,7 @@ adreno-y := \
> > > > >adreno/a6xx_gpu.o \
> > > > >adreno/a6xx_gmu.o \
> > > > >adreno/a6xx_hfi.o \
> > > > > + adreno/a6xx_preempt.o \
> > > > > 
> > > > >   adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > index 32a4faa93d7f..ed0b138a2d66 100644
> > > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > @@ -16,6 +16,83 @@
> > > > > 
> > > > >   #define GPU_PAS_ID 13
> > > > > 
> > > > > +/* IFPC & Preemption static powerup restore list */
> > > > > +static const uint32_t a7xx_pwrup_reglist[] = {
> > > > > + REG_A6XX_UCHE_TRAP_BASE,
> > > > > + REG_A6XX_UCHE_TRAP_BASE + 1,
> > > > > + REG_A6XX_UCHE_WRITE_THRU_BASE,
> > > > > + REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> > > > > + REG_A6XX_UCHE_GMEM_RANGE_MIN,
> > > > > + REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> > > > > + REG_A6XX_UCHE_GMEM_RANGE_MAX,
> > > > > + REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> > > > > + REG_A6XX_UCHE_CACHE_WAYS,
> > > > > + REG_A6XX_UCHE_MODE_CNTL,
> > > > > + REG_A6XX_RB_NC_MODE_CNTL,
> > > > > + REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> > > > > + REG_A7XX_GRAS_NC_MODE_CNTL,
> > > > > + REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> > > > > + REG_A6XX_UCHE_GBIF_GX_CONFIG,
> > > > > + REG_A6XX_UCHE_CLIENT_PF,
> > > > 
> > > > REG_A6XX_TPL1_DBG_ECO_CNTL1 here. A friendly warning, missing a register
> > > > in this list (and the below list) will lead to a very frustrating debug.
> > > > 
> > > > > +};
> > > > > +
> > > > > +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> > > > > + REG_A6XX_TPL1_NC_MODE_CNTL,
> > > > > + REG_A6XX_SP_NC_MODE_CNTL,
> > > > > + REG_A6XX_CP_DBG_ECO_CNTL,
> > > > > + REG_A6XX_CP_PROTECT_CNTL,
> > > > > + REG_A6XX_CP_PROTECT(0),
> > > > > + REG_A6XX_CP_PROTECT(1),
> > > > > + REG_A6XX_CP_PROTECT(2),
> > > > > + REG_A6

Re: [PATCH v3 06/10] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-09-12 Thread Akhil P Oommen
On Wed, Sep 11, 2024 at 12:35:08AM +0200, Antonino Maniscalco wrote:
> On 9/10/24 11:34 PM, Akhil P Oommen wrote:
> > On Mon, Sep 09, 2024 at 05:07:42PM +0200, Antonino Maniscalco wrote:
> > > On 9/6/24 10:08 PM, Akhil P Oommen wrote:
> > > > On Thu, Sep 05, 2024 at 04:51:24PM +0200, Antonino Maniscalco wrote:
> > > > > Use the postamble to reset perf counters when switching between rings,
> > > > > except when sysprof is enabled, analogously to how they are reset
> > > > > between submissions when switching pagetables.
> > > > > 
> > > > > Signed-off-by: Antonino Maniscalco 
> > > > > ---
> > > > >drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 20 ++-
> > > > >drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  5 +
> > > > >drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 32 
> > > > > +++
> > > > >drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 +--
> > > > >4 files changed, 61 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > index ed0b138a2d66..710ec3ce2923 100644
> > > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > > @@ -366,7 +366,8 @@ static void a6xx_submit(struct msm_gpu *gpu, 
> > > > > struct msm_gem_submit *submit)
> > > > >static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
> > > > >   struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue 
> > > > > *queue)
> > > > >{
> > > > > - u64 preempt_offset_priv_secure;
> > > > > + bool sysprof = 
> > > > > refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> > > > > + u64 preempt_offset_priv_secure, preempt_postamble;
> > > > >   OUT_PKT7(ring, CP_SET_PSEUDO_REG, 15);
> > > > > @@ -398,6 +399,23 @@ static void a6xx_emit_set_pseudo_reg(struct 
> > > > > msm_ringbuffer *ring,
> > > > >   /* seems OK to set to 0 to disable it */
> > > > >   OUT_RING(ring, 0);
> > > > >   OUT_RING(ring, 0);
> > > > > +
> > > > > + /* if not profiling set postamble to clear perfcounters, else 
> > > > > clear it */
> > > > > + if (!sysprof && a6xx_gpu->preempt_postamble_len) {
> > 
> > Setting len = 0 is enough to skip processing postamble packets. So how
> > about a simpler:
> > 
> > len = a6xx_gpu->preempt_postamble_len;
> > if (sysprof)
> > len = 0;
> > 
> > OUT_PKT7(ring, CP_SET_AMBLE, 3);
> > OUT_RING(ring, lower_32_bits(preempt_postamble));
> > OUT_RING(ring, upper_32_bits(preempt_postamble));
> > OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(len) |
> > CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> > 
> > > > > + preempt_postamble = a6xx_gpu->preempt_postamble_iova;
> > > > > +
> > > > > + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> > > > > + OUT_RING(ring, lower_32_bits(preempt_postamble));
> > > > > + OUT_RING(ring, upper_32_bits(preempt_postamble));
> > > > > + OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> > > > > + 
> > > > > a6xx_gpu->preempt_postamble_len) |
> > > > > + CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> > > > > + } else {
> > > > 
> > > > Why do we need this else part?
> > > 
> > > Wouldn't the postmable remain set if we don't explicitly set it to 0?
> > 
> > Aah, that is a genuine concern. I am not sure! Lets keep it.
> > 
> > > 
> > > > 
> > > > > + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> > > > > + OUT_RING(ring, 0);
> > > > > + OUT_RING(ring, 0);
> > > > > + OUT_RING(ring, CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> > > > > + }
> > > > >}
> > > > >static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit 
> > > > > *submit)
> > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> > > > > b/d

Re: [PATCH v3 06/10] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-09-10 Thread Akhil P Oommen
On Mon, Sep 09, 2024 at 05:07:42PM +0200, Antonino Maniscalco wrote:
> On 9/6/24 10:08 PM, Akhil P Oommen wrote:
> > On Thu, Sep 05, 2024 at 04:51:24PM +0200, Antonino Maniscalco wrote:
> > > Use the postamble to reset perf counters when switching between rings,
> > > except when sysprof is enabled, analogously to how they are reset
> > > between submissions when switching pagetables.
> > > 
> > > Signed-off-by: Antonino Maniscalco 
> > > ---
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 20 ++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  5 +
> > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 32 
> > > +++
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 +--
> > >   4 files changed, 61 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index ed0b138a2d66..710ec3ce2923 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -366,7 +366,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit)
> > >   static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
> > >   struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue 
> > > *queue)
> > >   {
> > > - u64 preempt_offset_priv_secure;
> > > + bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> > > + u64 preempt_offset_priv_secure, preempt_postamble;
> > >   OUT_PKT7(ring, CP_SET_PSEUDO_REG, 15);
> > > @@ -398,6 +399,23 @@ static void a6xx_emit_set_pseudo_reg(struct 
> > > msm_ringbuffer *ring,
> > >   /* seems OK to set to 0 to disable it */
> > >   OUT_RING(ring, 0);
> > >   OUT_RING(ring, 0);
> > > +
> > > + /* if not profiling set postamble to clear perfcounters, else clear it 
> > > */
> > > + if (!sysprof && a6xx_gpu->preempt_postamble_len) {

Setting len = 0 is enough to skip processing postamble packets. So how
about a simpler:

len = a6xx_gpu->preempt_postamble_len;
if (sysprof)
len = 0;

OUT_PKT7(ring, CP_SET_AMBLE, 3);
OUT_RING(ring, lower_32_bits(preempt_postamble));
OUT_RING(ring, upper_32_bits(preempt_postamble));
OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(len) |
CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));

> > > + preempt_postamble = a6xx_gpu->preempt_postamble_iova;
> > > +
> > > + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> > > + OUT_RING(ring, lower_32_bits(preempt_postamble));
> > > + OUT_RING(ring, upper_32_bits(preempt_postamble));
> > > + OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> > > + a6xx_gpu->preempt_postamble_len) |
> > > + CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> > > + } else {
> > 
> > Why do we need this else part?
> 
> Wouldn't the postmable remain set if we don't explicitly set it to 0?

Aah, that is a genuine concern. I am not sure! Lets keep it.

> 
> > 
> > > + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> > > + OUT_RING(ring, 0);
> > > + OUT_RING(ring, 0);
> > > + OUT_RING(ring, CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> > > + }
> > >   }
> > >   static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit 
> > > *submit)
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > index da10060e38dc..b009732c08c5 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > @@ -71,6 +71,11 @@ struct a6xx_gpu {
> > >   bool uses_gmem;
> > >   bool skip_save_restore;
> > > + struct drm_gem_object *preempt_postamble_bo;
> > > + void *preempt_postamble_ptr;
> > > + uint64_t preempt_postamble_iova;
> > > + uint64_t preempt_postamble_len;
> > > +
> > >   struct a6xx_gmu gmu;
> > >   struct drm_gem_object *shadow_bo;
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> > > index 1caff76aca6e..ec44f44d925f 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> > > @@ -34

Re: [PATCH v3 04/10] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-10 Thread Akhil P Oommen
On Mon, Sep 09, 2024 at 07:40:07AM -0700, Rob Clark wrote:
> On Mon, Sep 9, 2024 at 6:43 AM Connor Abbott  wrote:
> >
> > On Mon, Sep 9, 2024 at 2:15 PM Antonino Maniscalco
> >  wrote:
> > >
> > > On 9/6/24 9:54 PM, Akhil P Oommen wrote:
> > > > On Thu, Sep 05, 2024 at 04:51:22PM +0200, Antonino Maniscalco wrote:
> > > >> This patch implements preemption feature for A6xx targets, this allows
> > > >> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > >> hardware as such supports multiple levels of preemption granularities,
> > > >> ranging from coarse grained(ringbuffer level) to a more fine grained
> > > >> such as draw-call level or a bin boundary level preemption. This patch
> > > >> enables the basic preemption level, with more fine grained preemption
> > > >> support to follow.
> > > >>
> > > >> Signed-off-by: Sharat Masetty 
> > > >> Signed-off-by: Antonino Maniscalco 
> > > >> Tested-by: Neil Armstrong  # on SM8650-QRD
> > > >> ---
> > > >>   drivers/gpu/drm/msm/Makefile  |   1 +
> > > >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 293 
> > > >> +-
> > > >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 161 
> > > ...
> > > >
> > > > we can use the lighter smp variant here.
> > > >
> > > >> +
> > > >> +if (a6xx_gpu->cur_ring == ring)
> > > >> +gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
> > > >> +else
> > > >> +ring->skip_inline_wptr = true;
> > > >> +} else {
> > > >> +ring->skip_inline_wptr = true;
> > > >> +}
> > > >> +
> > > >> +spin_unlock_irqrestore(&ring->preempt_lock, flags);
> > > >>   }
> > > >>
> > > >>   static void get_stats_counter(struct msm_ringbuffer *ring, u32 
> > > >> counter,
> > > >> @@ -138,12 +231,14 @@ static void a6xx_set_pagetable(struct a6xx_gpu 
> > > >> *a6xx_gpu,
> > > >
> > > > set_pagetable checks "cur_ctx_seqno" to see if pt switch is needed or
> > > > not. This is currently not tracked separately for each ring. Can you
> > > > please check that?
> > >
> > > I totally missed that. Thanks for catching it!
> > >
> > > >
> > > > I wonder why that didn't cause any gpu errors in testing. Not sure if I
> > > > am missing something.
> > > >
> > >
> > > I think this is because, so long as a single context doesn't submit to
> > > two different rings with differenr priorities, we will only be incorrect
> > > in the sense that we emit more page table switches than necessary and
> > > never less. However untrusted userspace could create a context that
> > > submits to two different rings and that would lead to execution in the
> > > wrong context so we must fix this.

Yep, it would be a security bug!

-Akhil

> >
> > FWIW, in Mesa in the future we may want to expose multiple Vulkan
> > queues per device. Then this would definitely blow up.
> 
> This will actually be required by future android versions, with the
> switch to vk hwui backend (because apparently locking is hard, the
> solution was to use different queue's for different threads)
> 
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/11326
> 
> BR,
> -R


Re: [PATCH v3 04/10] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-10 Thread Akhil P Oommen
On Mon, Sep 09, 2024 at 01:22:22PM +0100, Connor Abbott wrote:
> On Fri, Sep 6, 2024 at 9:03 PM Akhil P Oommen  
> wrote:
> >
> > On Thu, Sep 05, 2024 at 04:51:22PM +0200, Antonino Maniscalco wrote:
> > > This patch implements preemption feature for A6xx targets, this allows
> > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > hardware as such supports multiple levels of preemption granularities,
> > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > such as draw-call level or a bin boundary level preemption. This patch
> > > enables the basic preemption level, with more fine grained preemption
> > > support to follow.
> > >
> > > Signed-off-by: Sharat Masetty 
> > > Signed-off-by: Antonino Maniscalco 
> > > Tested-by: Neil Armstrong  # on SM8650-QRD
> > > ---
> > >  drivers/gpu/drm/msm/Makefile  |   1 +
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 293 +-
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 161 
> > >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 391 
> > > ++
> > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > >  5 files changed, 844 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> > > index f5e2838c6a76..32e915109a59 100644
> > > --- a/drivers/gpu/drm/msm/Makefile
> > > +++ b/drivers/gpu/drm/msm/Makefile
> > > @@ -23,6 +23,7 @@ adreno-y := \
> > >   adreno/a6xx_gpu.o \
> > >   adreno/a6xx_gmu.o \
> > >   adreno/a6xx_hfi.o \
> > > + adreno/a6xx_preempt.o \
> > >
> > >  adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 32a4faa93d7f..ed0b138a2d66 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -16,6 +16,83 @@
> > >
> > >  #define GPU_PAS_ID 13
> > >
> > > +/* IFPC & Preemption static powerup restore list */
> > > +static const uint32_t a7xx_pwrup_reglist[] = {
> > > + REG_A6XX_UCHE_TRAP_BASE,
> > > + REG_A6XX_UCHE_TRAP_BASE + 1,
> > > + REG_A6XX_UCHE_WRITE_THRU_BASE,
> > > + REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MIN,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MAX,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> > > + REG_A6XX_UCHE_CACHE_WAYS,
> > > + REG_A6XX_UCHE_MODE_CNTL,
> > > + REG_A6XX_RB_NC_MODE_CNTL,
> > > + REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> > > + REG_A7XX_GRAS_NC_MODE_CNTL,
> > > + REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> > > + REG_A6XX_UCHE_GBIF_GX_CONFIG,
> > > + REG_A6XX_UCHE_CLIENT_PF,
> >
> > REG_A6XX_TPL1_DBG_ECO_CNTL1 here. A friendly warning, missing a register
> > in this list (and the below list) will lead to a very frustrating debug.
> >
> > > +};
> > > +
> > > +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> > > + REG_A6XX_TPL1_NC_MODE_CNTL,
> > > + REG_A6XX_SP_NC_MODE_CNTL,
> > > + REG_A6XX_CP_DBG_ECO_CNTL,
> > > + REG_A6XX_CP_PROTECT_CNTL,
> > > + REG_A6XX_CP_PROTECT(0),
> > > + REG_A6XX_CP_PROTECT(1),
> > > + REG_A6XX_CP_PROTECT(2),
> > > + REG_A6XX_CP_PROTECT(3),
> > > + REG_A6XX_CP_PROTECT(4),
> > > + REG_A6XX_CP_PROTECT(5),
> > > + REG_A6XX_CP_PROTECT(6),
> > > + REG_A6XX_CP_PROTECT(7),
> > > + REG_A6XX_CP_PROTECT(8),
> > > + REG_A6XX_CP_PROTECT(9),
> > > + REG_A6XX_CP_PROTECT(10),
> > > + REG_A6XX_CP_PROTECT(11),
> > > + REG_A6XX_CP_PROTECT(12),
> > > + REG_A6XX_CP_PROTECT(13),
> > > + REG_A6XX_CP_PROTECT(14),
> > > + REG_A6XX_CP_PROTECT(15),
> > > + REG_A6XX_CP_PROTECT(16),
> > > + REG_A6XX_CP_PROTECT(17),
> > > + REG_A6XX_CP_PROTECT(18),
> > > + REG_A6XX_CP_PROTECT(19),
> > > + REG_A6XX_CP_PROTECT(20),
> > > + REG_A6XX_CP_PROTECT(21),
> > > + REG_A6XX_CP_PROTECT(22),
> > > + REG_A6XX_CP_PROTECT(23),
> > > + REG_A6XX_CP_PROTECT(24),
> > > +  

Re: [PATCH v3 07/10] drm/msm/A6xx: Add traces for preemption

2024-09-06 Thread Akhil P Oommen
On Thu, Sep 05, 2024 at 04:51:25PM +0200, Antonino Maniscalco wrote:
> Add trace points corresponding to preemption being triggered and being
> completed for latency measurement purposes.
> 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c |  7 +++
>  drivers/gpu/drm/msm/msm_gpu_trace.h   | 28 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index ec44f44d925f..ca9d36c107f2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -7,6 +7,7 @@
>  #include "a6xx_gpu.h"
>  #include "a6xx_gmu.xml.h"
>  #include "msm_mmu.h"
> +#include "msm_gpu_trace.h"
>  
>  /*
>   * Try to transition the preemption state from old to new. Return
> @@ -143,6 +144,8 @@ void a6xx_preempt_irq(struct msm_gpu *gpu)
>  
>   set_preempt_state(a6xx_gpu, PREEMPT_NONE);
>  
> + trace_msm_gpu_preemption_irq(a6xx_gpu->cur_ring->id);
> +
>   /*
>* Retrigger preemption to avoid a deadlock that might occur when 
> preemption
>* is skipped due to it being already in flight when requested.
> @@ -264,6 +267,10 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>*/
>   ring->skip_inline_wptr = false;
>  
> + trace_msm_gpu_preemption_trigger(
> + a6xx_gpu->cur_ring ? a6xx_gpu->cur_ring->id : -1,

Can't we avoid this check?

-Akhil.

> + ring ? ring->id : -1);
> +
>   spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  
>   gpu_write64(gpu,
> diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
> b/drivers/gpu/drm/msm/msm_gpu_trace.h
> index ac40d857bc45..7f863282db0d 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_trace.h
> +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
> @@ -177,6 +177,34 @@ TRACE_EVENT(msm_gpu_resume,
>   TP_printk("%u", __entry->dummy)
>  );
>  
> +TRACE_EVENT(msm_gpu_preemption_trigger,
> + TP_PROTO(int ring_id_from, int ring_id_to),
> + TP_ARGS(ring_id_from, ring_id_to),
> + TP_STRUCT__entry(
> + __field(int, ring_id_from)
> + __field(int, ring_id_to)
> + ),
> + TP_fast_assign(
> + __entry->ring_id_from = ring_id_from;
> + __entry->ring_id_to = ring_id_to;
> + ),
> + TP_printk("preempting %u -> %u",
> +   __entry->ring_id_from,
> +   __entry->ring_id_to)
> +);
> +
> +TRACE_EVENT(msm_gpu_preemption_irq,
> + TP_PROTO(u32 ring_id),
> + TP_ARGS(ring_id),
> + TP_STRUCT__entry(
> + __field(u32, ring_id)
> + ),
> + TP_fast_assign(
> + __entry->ring_id = ring_id;
> + ),
> + TP_printk("preempted to %u", __entry->ring_id)
> +);
> +
>  #endif
>  
>  #undef TRACE_INCLUDE_PATH
> 
> -- 
> 2.46.0
> 


Re: [PATCH v3 06/10] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-09-06 Thread Akhil P Oommen
On Thu, Sep 05, 2024 at 04:51:24PM +0200, Antonino Maniscalco wrote:
> Use the postamble to reset perf counters when switching between rings,
> except when sysprof is enabled, analogously to how they are reset
> between submissions when switching pagetables.
> 
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 20 ++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  5 +
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 32 
> +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 +--
>  4 files changed, 61 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index ed0b138a2d66..710ec3ce2923 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -366,7 +366,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>  static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
>   struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
>  {
> - u64 preempt_offset_priv_secure;
> + bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> + u64 preempt_offset_priv_secure, preempt_postamble;
>  
>   OUT_PKT7(ring, CP_SET_PSEUDO_REG, 15);
>  
> @@ -398,6 +399,23 @@ static void a6xx_emit_set_pseudo_reg(struct 
> msm_ringbuffer *ring,
>   /* seems OK to set to 0 to disable it */
>   OUT_RING(ring, 0);
>   OUT_RING(ring, 0);
> +
> + /* if not profiling set postamble to clear perfcounters, else clear it 
> */
> + if (!sysprof && a6xx_gpu->preempt_postamble_len) {
> + preempt_postamble = a6xx_gpu->preempt_postamble_iova;
> +
> + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> + OUT_RING(ring, lower_32_bits(preempt_postamble));
> + OUT_RING(ring, upper_32_bits(preempt_postamble));
> + OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> + a6xx_gpu->preempt_postamble_len) |
> + CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> + } else {

Why do we need this else part?

> + OUT_PKT7(ring, CP_SET_AMBLE, 3);
> + OUT_RING(ring, 0);
> + OUT_RING(ring, 0);
> + OUT_RING(ring, CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> + }
>  }
>  
>  static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index da10060e38dc..b009732c08c5 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -71,6 +71,11 @@ struct a6xx_gpu {
>   bool uses_gmem;
>   bool skip_save_restore;
>  
> + struct drm_gem_object *preempt_postamble_bo;
> + void *preempt_postamble_ptr;
> + uint64_t preempt_postamble_iova;
> + uint64_t preempt_postamble_len;
> +
>   struct a6xx_gmu gmu;
>  
>   struct drm_gem_object *shadow_bo;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index 1caff76aca6e..ec44f44d925f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -346,6 +346,28 @@ static int preempt_init_ring(struct a6xx_gpu *a6xx_gpu,
>   return 0;
>  }
>  
> +static void preempt_prepare_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> + u32 *postamble = a6xx_gpu->preempt_postamble_ptr;
> + u32 count = 0;
> +
> + postamble[count++] = PKT7(CP_REG_RMW, 3);
> + postamble[count++] = REG_A6XX_RBBM_PERFCTR_SRAM_INIT_CMD;
> + postamble[count++] = 0;
> + postamble[count++] = 1;
> +
> + postamble[count++] = PKT7(CP_WAIT_REG_MEM, 6);
> + postamble[count++] = CP_WAIT_REG_MEM_0_FUNCTION(WRITE_EQ);
> + postamble[count++] = CP_WAIT_REG_MEM_1_POLL_ADDR_LO(
> + REG_A6XX_RBBM_PERFCTR_SRAM_INIT_STATUS);
> + postamble[count++] = CP_WAIT_REG_MEM_2_POLL_ADDR_HI(0);
> + postamble[count++] = CP_WAIT_REG_MEM_3_REF(0x1);
> + postamble[count++] = CP_WAIT_REG_MEM_4_MASK(0x1);
> + postamble[count++] = CP_WAIT_REG_MEM_5_DELAY_LOOP_CYCLES(0);

Isn't it better to just replace this with NOP packets when sysprof is
enabled, just before triggering preemption? It will help to have an
immediate effect.

-Akhil

> +
> + a6xx_gpu->preempt_postamble_len = count;
> +}
> +
>  void a6xx_preempt_fini(struct msm_gpu *gpu)
>  {
>   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -376,6 +398,16 @@ void a6xx_preempt_init(struct msm_gpu *gpu)
>   a6xx_gpu->uses_gmem = 1;
>   a6xx_gpu->skip_save_restore = 1;
>  
> + a6xx_gpu->preempt_postamble_ptr  = msm_gem_kernel_new(gpu->dev,
> + PAGE_SIZE, MSM_BO_WC | MSM_BO_MAP_PRIV,
> + gpu->aspace, &a6xx_gpu->preempt_postamble_bo,
> + &a6xx_gpu->preempt_postambl

Re: [PATCH v3 00/10] Preemption support for A7XX

2024-09-06 Thread Akhil P Oommen
On Thu, Sep 05, 2024 at 04:51:18PM +0200, Antonino Maniscalco wrote:
> This series implements preemption for A7XX targets, which allows the GPU to
> switch to an higher priority ring when work is pushed to it, reducing latency
> for high priority submissions.
> 
> This series enables L1 preemption with skip_save_restore which requires
> the following userspace patches to function:
> 
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> 
> A flag is added to `msm_submitqueue_create` to only allow submissions
> from compatible userspace to be preempted, therefore maintaining
> compatibility.
> 
> Preemption is currently only enabled by default on A750, it can be
> enabled on other targets through the `enable_preemption` module
> parameter. This is because more testing is required on other targets.
> 
> For testing on other HW it is sufficient to set that parameter to a
> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> allows to run any application as high priority therefore preempting
> submissions from other applications.
> 
> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> added in this series can be used to observe preemption's behavior as
> well as measuring preemption latency.
> 
> Some commits from this series are based on a previous series to enable
> preemption on A6XX targets:
> 
> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> 
> Signed-off-by: Antonino Maniscalco 

Antonino, can you please test this once with per-process pt disabled to
ensure that is not broken? It is handy sometimes while debugging.
We just need to remove "adreno-smmu" compatible string from gpu smmu
node in DT.

-Akhil.

> ---
> Changes in v3:
> - Added documentation about preemption
> - Use quirks to determine which target supports preemption
> - Add a module parameter to force disabling or enabling preemption
> - Clear postamble when profiling
> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> - Make preemption records MAP_PRIV
> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>   anymore
> - Link to v2: 
> https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> 
> Changes in v2:
> - Added preept_record_size for X185 in PATCH 3/7
> - Added patches to reset perf counters
> - Dropped unused defines
> - Dropped unused variable (fixes warning)
> - Only enable preemption on a750
> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> - Added Neil's Tested-By tags
> - Added explanation for UAPI changes in commit message
> - Link to v1: 
> https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> 
> ---
> Antonino Maniscalco (10):
>   drm/msm: Fix bv_fence being used as bv_rptr
>   drm/msm: Add a `preempt_record_size` field
>   drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>   drm/msm/A6xx: Implement preemption for A7XX targets
>   drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>   drm/msm/A6xx: Use posamble to reset counters on preemption
>   drm/msm/A6xx: Add traces for preemption
>   drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>   drm/msm/A6xx: Enable preemption for A750
>   Documentation: document adreno preemption
> 
>  Documentation/gpu/msm-preemption.rst   |  98 +
>  drivers/gpu/drm/msm/Makefile   |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |   7 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 331 +++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h  | 166 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c  | 430 
> +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h|   9 +-
>  drivers/gpu/drm/msm/msm_drv.c  |   4 +
>  drivers/gpu/drm/msm/msm_gpu_trace.h|  28 ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h   |   8 +
>  drivers/gpu/drm/msm/msm_submitqueue.c  |   3 +
>  drivers/gpu/drm/msm/registers/adreno/a6xx.xml  |   7 +-
>  .../gpu/drm/msm/registers/adreno/adreno_pm4.xml|  39 +-
>  include/uapi/drm/msm_drm.h |   5 +-
>  14 files changed, 1094 insertions(+), 42 deletions(-)
> ---
> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> change-id: 20240815-preemption-a750-t-fcee9a844b39
> 
> Best regards,
> -- 
> Antonino Maniscalco 
> 


Re: [PATCH v3 04/10] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-06 Thread Akhil P Oommen
On Thu, Sep 05, 2024 at 04:51:22PM +0200, Antonino Maniscalco wrote:
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
> 
> Signed-off-by: Sharat Masetty 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 293 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 161 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 391 
> ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
>  5 files changed, 844 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index f5e2838c6a76..32e915109a59 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -23,6 +23,7 @@ adreno-y := \
>   adreno/a6xx_gpu.o \
>   adreno/a6xx_gmu.o \
>   adreno/a6xx_hfi.o \
> + adreno/a6xx_preempt.o \
>  
>  adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 32a4faa93d7f..ed0b138a2d66 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -16,6 +16,83 @@
>  
>  #define GPU_PAS_ID 13
>  
> +/* IFPC & Preemption static powerup restore list */
> +static const uint32_t a7xx_pwrup_reglist[] = {
> + REG_A6XX_UCHE_TRAP_BASE,
> + REG_A6XX_UCHE_TRAP_BASE + 1,
> + REG_A6XX_UCHE_WRITE_THRU_BASE,
> + REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> + REG_A6XX_UCHE_GMEM_RANGE_MIN,
> + REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> + REG_A6XX_UCHE_GMEM_RANGE_MAX,
> + REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> + REG_A6XX_UCHE_CACHE_WAYS,
> + REG_A6XX_UCHE_MODE_CNTL,
> + REG_A6XX_RB_NC_MODE_CNTL,
> + REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> + REG_A7XX_GRAS_NC_MODE_CNTL,
> + REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> + REG_A6XX_UCHE_GBIF_GX_CONFIG,
> + REG_A6XX_UCHE_CLIENT_PF,

REG_A6XX_TPL1_DBG_ECO_CNTL1 here. A friendly warning, missing a register
in this list (and the below list) will lead to a very frustrating debug.

> +};
> +
> +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> + REG_A6XX_TPL1_NC_MODE_CNTL,
> + REG_A6XX_SP_NC_MODE_CNTL,
> + REG_A6XX_CP_DBG_ECO_CNTL,
> + REG_A6XX_CP_PROTECT_CNTL,
> + REG_A6XX_CP_PROTECT(0),
> + REG_A6XX_CP_PROTECT(1),
> + REG_A6XX_CP_PROTECT(2),
> + REG_A6XX_CP_PROTECT(3),
> + REG_A6XX_CP_PROTECT(4),
> + REG_A6XX_CP_PROTECT(5),
> + REG_A6XX_CP_PROTECT(6),
> + REG_A6XX_CP_PROTECT(7),
> + REG_A6XX_CP_PROTECT(8),
> + REG_A6XX_CP_PROTECT(9),
> + REG_A6XX_CP_PROTECT(10),
> + REG_A6XX_CP_PROTECT(11),
> + REG_A6XX_CP_PROTECT(12),
> + REG_A6XX_CP_PROTECT(13),
> + REG_A6XX_CP_PROTECT(14),
> + REG_A6XX_CP_PROTECT(15),
> + REG_A6XX_CP_PROTECT(16),
> + REG_A6XX_CP_PROTECT(17),
> + REG_A6XX_CP_PROTECT(18),
> + REG_A6XX_CP_PROTECT(19),
> + REG_A6XX_CP_PROTECT(20),
> + REG_A6XX_CP_PROTECT(21),
> + REG_A6XX_CP_PROTECT(22),
> + REG_A6XX_CP_PROTECT(23),
> + REG_A6XX_CP_PROTECT(24),
> + REG_A6XX_CP_PROTECT(25),
> + REG_A6XX_CP_PROTECT(26),
> + REG_A6XX_CP_PROTECT(27),
> + REG_A6XX_CP_PROTECT(28),
> + REG_A6XX_CP_PROTECT(29),
> + REG_A6XX_CP_PROTECT(30),
> + REG_A6XX_CP_PROTECT(31),
> + REG_A6XX_CP_PROTECT(32),
> + REG_A6XX_CP_PROTECT(33),
> + REG_A6XX_CP_PROTECT(34),
> + REG_A6XX_CP_PROTECT(35),
> + REG_A6XX_CP_PROTECT(36),
> + REG_A6XX_CP_PROTECT(37),
> + REG_A6XX_CP_PROTECT(38),
> + REG_A6XX_CP_PROTECT(39),
> + REG_A6XX_CP_PROTECT(40),
> + REG_A6XX_CP_PROTECT(41),
> + REG_A6XX_CP_PROTECT(42),
> + REG_A6XX_CP_PROTECT(43),
> + REG_A6XX_CP_PROTECT(44),
> + REG_A6XX_CP_PROTECT(45),
> + REG_A6XX_CP_PROTECT(46),
> + REG_A6XX_CP_PROTECT(47),
> + REG_A6XX_CP_AHB_CNTL,
> +};
> +
> +
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>  {
>   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -68,6 +145,8 @@ static void update_shadow_rptr(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>  
>  static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  {
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   uint32_t wptr;
>   unsigned long flags;
>  
> @@ -81,12 +160,26 @@ static void a6xx_flush(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ri

Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-28 Thread Akhil P Oommen
On Wed, Aug 28, 2024 at 06:46:37AM -0700, Rob Clark wrote:
> On Wed, Aug 28, 2024 at 6:42 AM Rob Clark  wrote:
> >
> > On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
> >  wrote:
> > >
> > > On 8/27/24 11:07 PM, Rob Clark wrote:
> > > > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
> > > >  wrote:
> > > >>
> > > >> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> > > >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> > > >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott  
> > > >>>> wrote:
> > > >>>>>
> > > >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen 
> > > >>>>>  wrote:
> > > >>>>>>
> > > >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> > > >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen 
> > > >>>>>>>  wrote:
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco 
> > > >>>>>>>> wrote:
> > > >>>>>>>>> This patch implements preemption feature for A6xx targets, this 
> > > >>>>>>>>> allows
> > > >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is 
> > > >>>>>>>>> ready. A6XX
> > > >>>>>>>>> hardware as such supports multiple levels of preemption 
> > > >>>>>>>>> granularities,
> > > >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine 
> > > >>>>>>>>> grained
> > > >>>>>>>>> such as draw-call level or a bin boundary level preemption. 
> > > >>>>>>>>> This patch
> > > >>>>>>>>> enables the basic preemption level, with more fine grained 
> > > >>>>>>>>> preemption
> > > >>>>>>>>> support to follow.
> > > >>>>>>>>>
> > > >>>>>>>>> Signed-off-by: Sharat Masetty 
> > > >>>>>>>>> Signed-off-by: Antonino Maniscalco 
> > > >>>>>>>>> ---
> > > >>>>>>>>
> > > >>>>>>>> No postamble packets which resets perfcounters? It is necessary. 
> > > >>>>>>>> Also, I
> > > >>>>>>>> think we should disable preemption during profiling like we 
> > > >>>>>>>> disable slumber.
> > > >>>>>>>>
> > > >>>>>>>> -Akhil.
> > > >>>>>>>
> > > >>>>>>> I don't see anything in kgsl which disables preemption during
> > > >>>>>>> profiling. It disables resetting perfcounters when doing 
> > > >>>>>>> system-wide
> > > >>>>>>> profiling, like freedreno, and in that case I assume preempting is
> > > >>>>>>> fine because the system profiler has a complete view of 
> > > >>>>>>> everything and
> > > >>>>>>> should "see" preemptions through the traces. For something like
> > > >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
> > > >>>>>>> because we disable saving/restoring perf counters, but that has to
> > > >>>>>>> happen in userspace because the kernel doesn't know what userspace
> > > >>>>>>> does.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> KGSL does some sort of arbitration of perfcounter configurations 
> > > >>>>>> and
> > > >>>>>> adds the select/enablement reg configuration as part of dynamic
> > > >>>>>> power up register list which we are not doing here. Is this 
> > > >>>>>> something
> > > >>>>>> you are taking care of from userspace via preamble?
> > > >>>>>>
> > > >>>>>> -A

Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-27 Thread Akhil P Oommen
On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott  wrote:
> >
> > On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen  
> > wrote:
> > >
> > > On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> > > > On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen 
> > > >  wrote:
> > > > >
> > > > > On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> > > > > > This patch implements preemption feature for A6xx targets, this 
> > > > > > allows
> > > > > > the GPU to switch to a higher priority ringbuffer if one is ready. 
> > > > > > A6XX
> > > > > > hardware as such supports multiple levels of preemption 
> > > > > > granularities,
> > > > > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > > > > such as draw-call level or a bin boundary level preemption. This 
> > > > > > patch
> > > > > > enables the basic preemption level, with more fine grained 
> > > > > > preemption
> > > > > > support to follow.
> > > > > >
> > > > > > Signed-off-by: Sharat Masetty 
> > > > > > Signed-off-by: Antonino Maniscalco 
> > > > > > ---
> > > > >
> > > > > No postamble packets which resets perfcounters? It is necessary. 
> > > > > Also, I
> > > > > think we should disable preemption during profiling like we disable 
> > > > > slumber.
> > > > >
> > > > > -Akhil.
> > > >
> > > > I don't see anything in kgsl which disables preemption during
> > > > profiling. It disables resetting perfcounters when doing system-wide
> > > > profiling, like freedreno, and in that case I assume preempting is
> > > > fine because the system profiler has a complete view of everything and
> > > > should "see" preemptions through the traces. For something like
> > > > VK_KHR_performance_query I suppose we'd want to disable preemption
> > > > because we disable saving/restoring perf counters, but that has to
> > > > happen in userspace because the kernel doesn't know what userspace
> > > > does.
> > > >
> > >
> > > KGSL does some sort of arbitration of perfcounter configurations and
> > > adds the select/enablement reg configuration as part of dynamic
> > > power up register list which we are not doing here. Is this something
> > > you are taking care of from userspace via preamble?
> > >
> > > -Akhil
> >
> > I don't think we have to take care of that in userspace, because Mesa
> > will always configure the counter registers before reading them in the
> > same submission, and if it gets preempted in the meantime then we're
> > toast anyways (due to not saving/restoring perf counters). kgsl sets
> > them from userspace, which is why it has to do something to set them
> 
> Sorry, should be "kgsl sets them from the kernel".
> 
> > after IFPC slumber or a context switch when the HW state is gone.
> > Also, because the upstream approach doesn't play nicely with system
> > profilers like perfetto, VK_KHR_performance_query is hidden by default
> > behind a debug flag in turnip. So there's already an element of "this
> > is unsupported, you have to know what you're doing to use it."

But when you have composition on GPU enabled, there will be very frequent
preemption. And I don't know how usable profiling tools will be in that
case unless you disable preemption with a Mesa debug flag. But for that
to work, all existing submitqueues should be destroyed and recreated.

So I was thinking that we can use the sysprof propertry to force L0
preemption from kernel. 

-Akhil.

> >
> > Connor
> >
> > >
> > > > Connor
> > > >
> > > > >
> > > > > >  drivers/gpu/drm/msm/Makefile  |   1 +
> > > > > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 
> > > > > > +-
> > > > > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
> > > > > >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 434 
> > > > > > ++
> > > > > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > > >

Re: [PATCH v7 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-26 Thread Akhil P Oommen
On Thu, Aug 22, 2024 at 04:15:24PM -0700, Rob Clark wrote:
> On Thu, Aug 22, 2024 at 1:34 PM Akhil P Oommen  
> wrote:
> >
> > On Tue, Aug 20, 2024 at 10:16:47AM -0700, Rob Clark wrote: > From: Rob 
> > Clark 
> > >
> > > In the case of iova fault triggered devcore dumps, include additional
> > > debug information based on what we think is the current page tables,
> > > including the TTBR0 value (which should match what we have in
> > > adreno_smmu_fault_info unless things have gone horribly wrong), and
> > > the pagetable entries traversed in the process of resolving the
> > > faulting iova.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
> > >  drivers/gpu/drm/msm/msm_gpu.c   |  9 +
> > >  drivers/gpu/drm/msm/msm_gpu.h   |  8 
> > >  drivers/gpu/drm/msm/msm_iommu.c | 22 ++
> > >  drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
> > >  5 files changed, 51 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > index 1c6626747b98..3848b5a64351 100644
> > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > @@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct 
> > > msm_gpu_state *state,
> > >   drm_printf(p, "  - dir=%s\n", info->flags & 
> > > IOMMU_FAULT_WRITE ? "WRITE" : "READ");
> > >   drm_printf(p, "  - type=%s\n", info->type);
> > >   drm_printf(p, "  - source=%s\n", info->block);
> > > +
> > > + /* Information extracted from what we think are the current
> > > +  * pgtables.  Hopefully the TTBR0 matches what we've 
> > > extracted
> > > +  * from the SMMU registers in smmu_info!
> > > +  */
> > > + drm_puts(p, "pgtable-fault-info:\n");
> > > + drm_printf(p, "  - ttbr0: %.16llx\n", 
> > > (u64)info->pgtbl_ttbr0);
> >
> > "0x" prefix? Otherwise, it is a bit confusing when the below one is
> > decimal.
> 
> mixed feelings, the extra 0x is annoying when pasting into calc which
> is a simple way to get binary decoding
> 
> OTOH none of this is machine decoded so I guess we could change it

On second thought, I think it is fine as this is an address. Probably,
it is helpful for the pte values below.

> 
> > > + drm_printf(p, "  - asid: %d\n", info->asid);
> > > + drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
> > > +info->ptes[0], info->ptes[1], info->ptes[2], 
> > > info->ptes[3]);
> >
> > Does crashdec decodes this?
> 
> No, it just passed thru for human eyeballs
> 
> crashdec _does_ have some logic to flag buffers that are "near" the
> faulting iova to help identify if the fault is an underflow/overflow
> (which has been, along with the pte trail, useful to debug some
> issues)

Alright.

Reviewed-by: Akhil P Oommen 

-Akhil.
> 
> BR,
> -R
> 
> > -Akhil.
> >
> > >   }
> > >
> > >   drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
> > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > > index 3666b42b4ecd..bf2f8b2a7ccc 100644
> > > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > > @@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct 
> > > msm_gpu *gpu,
> > >   if (submit) {
> > >   int i;
> > >
> > > + if (state->fault_info.ttbr0) {
> > > + struct msm_gpu_fault_info *info = 
> > > &state->fault_info;
> > > + struct msm_mmu *mmu = submit->aspace->mmu;
> > > +
> > > + msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
> > > +&info->asid);
> > > + msm_iommu_pagetable_walk(mmu, info->iova, 
> > > info->ptes);
> > > + }
> > > +
> > >   state->bos = kcalloc(submit->nr_bos,
> > >  

Re: [PATCH -next] drm/msm/adreno: Use kvmemdup to simplify the code

2024-08-22 Thread Akhil P Oommen
On Wed, Aug 21, 2024 at 09:21:34AM +0800, Li Zetao wrote:
> Use kvmemdup instead of kvmalloc() + memcpy() to simplify the code.
> 
> No functional change intended.
> 
> Signed-off-by: Li Zetao 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1c6626747b98..ef473ac88159 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -688,11 +688,9 @@ int adreno_gpu_state_get(struct msm_gpu *gpu, struct 
> msm_gpu_state *state)
>   size = j + 1;
>  
>   if (size) {
> - state->ring[i].data = kvmalloc(size << 2, GFP_KERNEL);
> - if (state->ring[i].data) {
> - memcpy(state->ring[i].data, gpu->rb[i]->start, 
> size << 2);
> + state->ring[i].data = kvmemdup(gpu->rb[i]->start, size 
> << 2, GFP_KERNEL);
> + if (state->ring[i].data)
>   state->ring[i].data_size = size << 2;
> - }
>   }
>   }
>  
> -- 
> 2.34.1
> 
> 


Re: [PATCH v7 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-22 Thread Akhil P Oommen
On Tue, Aug 20, 2024 at 10:16:47AM -0700, Rob Clark wrote: > From: Rob Clark 

> 
> In the case of iova fault triggered devcore dumps, include additional
> debug information based on what we think is the current page tables,
> including the TTBR0 value (which should match what we have in
> adreno_smmu_fault_info unless things have gone horribly wrong), and
> the pagetable entries traversed in the process of resolving the
> faulting iova.
> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
>  drivers/gpu/drm/msm/msm_gpu.c   |  9 +
>  drivers/gpu/drm/msm/msm_gpu.h   |  8 
>  drivers/gpu/drm/msm/msm_iommu.c | 22 ++
>  drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
>  5 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1c6626747b98..3848b5a64351 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct 
> msm_gpu_state *state,
>   drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
> "WRITE" : "READ");
>   drm_printf(p, "  - type=%s\n", info->type);
>   drm_printf(p, "  - source=%s\n", info->block);
> +
> + /* Information extracted from what we think are the current
> +  * pgtables.  Hopefully the TTBR0 matches what we've extracted
> +  * from the SMMU registers in smmu_info!
> +  */
> + drm_puts(p, "pgtable-fault-info:\n");
> + drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);

"0x" prefix? Otherwise, it is a bit confusing when the below one is
decimal.

> + drm_printf(p, "  - asid: %d\n", info->asid);
> + drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
> +info->ptes[0], info->ptes[1], info->ptes[2], 
> info->ptes[3]);

Does crashdec decodes this?

-Akhil.

>   }
>  
>   drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index 3666b42b4ecd..bf2f8b2a7ccc 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu 
> *gpu,
>   if (submit) {
>   int i;
>  
> + if (state->fault_info.ttbr0) {
> + struct msm_gpu_fault_info *info = &state->fault_info;
> + struct msm_mmu *mmu = submit->aspace->mmu;
> +
> + msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
> +&info->asid);
> + msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
> + }
> +
>   state->bos = kcalloc(submit->nr_bos,
>   sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
>  
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 1f02bb9956be..82e838ba8c80 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
>   int flags;
>   const char *type;
>   const char *block;
> +
> + /* Information about what we think/expect is the current SMMU state,
> +  * for example expected_ttbr0 should match smmu_info.ttbr0 which
> +  * was read back from SMMU registers.
> +  */
> + phys_addr_t pgtbl_ttbr0;
> + u64 ptes[4];
> + int asid;
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 2a94e82316f9..3e692818ba1f 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -195,6 +195,28 @@ struct iommu_domain_geometry 
> *msm_iommu_get_geometry(struct msm_mmu *mmu)
>   return &iommu->domain->geometry;
>  }
>  
> +int
> +msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
> ptes[4])
> +{
> + struct msm_iommu_pagetable *pagetable;
> + struct arm_lpae_io_pgtable_walk_data wd = {};
> +
> + if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
> + return -EINVAL;
> +
> + pagetable = to_pagetable(mmu);
> +
> + if (!pagetable->pgtbl_ops->pgtable_walk)
> + return -EINVAL;
> +
> + pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, &wd);
> +
> + for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
> + ptes[i] = wd.ptes[i];
> +
> + return 0;
> +}
> +
>  static const struct msm_mmu_funcs pagetable_funcs = {
>   .map = msm_iommu_pagetable_map,
>   .unmap = msm_iommu_pagetable_unmap,
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 88af4f490881..96e509bd96a6 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/

Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-22 Thread Akhil P Oommen
On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen  
> wrote:
> >
> > On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> > > This patch implements preemption feature for A6xx targets, this allows
> > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > hardware as such supports multiple levels of preemption granularities,
> > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > such as draw-call level or a bin boundary level preemption. This patch
> > > enables the basic preemption level, with more fine grained preemption
> > > support to follow.
> > >
> > > Signed-off-by: Sharat Masetty 
> > > Signed-off-by: Antonino Maniscalco 
> > > ---
> >
> > No postamble packets which resets perfcounters? It is necessary. Also, I
> > think we should disable preemption during profiling like we disable slumber.
> >
> > -Akhil.
> 
> I don't see anything in kgsl which disables preemption during
> profiling. It disables resetting perfcounters when doing system-wide
> profiling, like freedreno, and in that case I assume preempting is
> fine because the system profiler has a complete view of everything and
> should "see" preemptions through the traces. For something like
> VK_KHR_performance_query I suppose we'd want to disable preemption
> because we disable saving/restoring perf counters, but that has to
> happen in userspace because the kernel doesn't know what userspace
> does.
> 

KGSL does some sort of arbitration of perfcounter configurations and
adds the select/enablement reg configuration as part of dynamic
power up register list which we are not doing here. Is this something
you are taking care of from userspace via preamble?

-Akhil

> Connor
> 
> >
> > >  drivers/gpu/drm/msm/Makefile  |   1 +
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
> > >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 434 
> > > ++
> > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > >  5 files changed, 924 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> > > index f5e2838c6a76..32e915109a59 100644
> > > --- a/drivers/gpu/drm/msm/Makefile
> > > +++ b/drivers/gpu/drm/msm/Makefile
> > > @@ -23,6 +23,7 @@ adreno-y := \
> > >   adreno/a6xx_gpu.o \
> > >   adreno/a6xx_gmu.o \
> > >   adreno/a6xx_hfi.o \
> > > + adreno/a6xx_preempt.o \
> > >
> > >  adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 32a4faa93d7f..1a90db5759b8 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -16,6 +16,83 @@
> > >
> > >  #define GPU_PAS_ID 13
> > >
> > > +/* IFPC & Preemption static powerup restore list */
> > > +static const uint32_t a7xx_pwrup_reglist[] = {
> > > + REG_A6XX_UCHE_TRAP_BASE,
> > > + REG_A6XX_UCHE_TRAP_BASE + 1,
> > > + REG_A6XX_UCHE_WRITE_THRU_BASE,
> > > + REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MIN,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MAX,
> > > + REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> > > + REG_A6XX_UCHE_CACHE_WAYS,
> > > + REG_A6XX_UCHE_MODE_CNTL,
> > > + REG_A6XX_RB_NC_MODE_CNTL,
> > > + REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> > > + REG_A7XX_GRAS_NC_MODE_CNTL,
> > > + REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> > > + REG_A6XX_UCHE_GBIF_GX_CONFIG,
> > > + REG_A6XX_UCHE_CLIENT_PF,
> > > +};
> > > +
> > > +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> > > + REG_A6XX_TPL1_NC_MODE_CNTL,
> > > + REG_A6XX_SP_NC_MODE_CNTL,
> > > + REG_A6XX_CP_DBG_ECO_CNTL,
> > > + REG_A6XX_CP_PROTECT_CNTL,
> > > + REG_A6XX_CP_PROTECT(0),
> > > + REG_A6XX_CP_PROTECT(1),
> > > + REG_A6XX_CP_PROTECT(2),
> > > + REG_A6XX_CP_PROTECT(3),
> > > + REG_A6XX_CP_PROTECT(4),
> > > + REG_A6XX_CP_PROTECT(5),
> >

Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-22 Thread Akhil P Oommen
On Wed, Aug 21, 2024 at 04:34:15PM +0200, Antonino Maniscalco wrote:
> On 8/19/24 10:08 PM, Akhil P Oommen wrote:
> > On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> > > This patch implements preemption feature for A6xx targets, this allows
> > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > hardware as such supports multiple levels of preemption granularities,
> > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > such as draw-call level or a bin boundary level preemption. This patch
> > > enables the basic preemption level, with more fine grained preemption
> > > support to follow.
> > > 
> > > Signed-off-by: Sharat Masetty 
> > > Signed-off-by: Antonino Maniscalco 
> > > ---
> > 
> > No postamble packets which resets perfcounters? It is necessary. Also, I
> > think we should disable preemption during profiling like we disable slumber.
> > 
> > -Akhil.
> > 
> 
> You mention that we disable slumber during profiling however I wasn't able
> to find code doing that. Can you please clarify which code you are referring
> to or a mechanism through which the kernel can know when we are profiling?
> 

Please check msm_file_private_set_sysprof().

-Akhil

> Best regards,
> -- 
> Antonino Maniscalco 
> 


Re: [PATCH 6/7] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create

2024-08-22 Thread Akhil P Oommen
On Tue, Aug 20, 2024 at 11:48:33AM +0100, Connor Abbott wrote:
> On Mon, Aug 19, 2024 at 9:31 PM Akhil P Oommen  
> wrote:
> >
> > On Thu, Aug 15, 2024 at 08:26:16PM +0200, Antonino Maniscalco wrote:
> > > Some userspace changes are necessary so add a flag for userspace to
> > > advertise support for preemption.
> >
> > So the intention is to fallback to level 0 preemption until user moves
> > to Mesa libs with level 1 support for each new GPU? Please elaborate a bit.
> >
> > -Akhil.
> 
> Yes, that's right. My Mesa series fixes L1 preemption and
> skipsaverestore by changing some of the CP_SET_MARKER calls and
> register programming and introducing CP_SET_AMBLE calls and then
> enables the flag on a7xx.

And we want to control L1 preemption per submitqueue because both
freedreno and turnip may not have support ready at the same time?

Antonino, since this is a UAPI update, it is good to have these details
captured in the commit msg for reference.

-Akhil.

> 
> Connor
> 
> >
> > >
> > > Signed-off-by: Antonino Maniscalco 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 
> > >  include/uapi/drm/msm_drm.h|  5 -
> > >  2 files changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 1a90db5759b8..86357016db8d 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -453,8 +453,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit)
> > >   OUT_PKT7(ring, CP_SET_MARKER, 1);
> > >   OUT_RING(ring, 0x101); /* IFPC disable */
> > >
> > > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > - OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > + OUT_RING(ring, 0x00d); /* IB1LIST start */
> > > + }
> > >
> > >   /* Submit the commands */
> > >   for (i = 0; i < submit->nr_cmds; i++) {
> > > @@ -485,8 +487,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > > msm_gem_submit *submit)
> > >   update_shadow_rptr(gpu, ring);
> > >   }
> > >
> > > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > - OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > > + OUT_RING(ring, 0x00e); /* IB1LIST end */
> > > + }
> > >
> > >   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
> > >   rbmemptr_stats(ring, index, cpcycles_end));
> > > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> > > index 3fca72f73861..f37858db34e6 100644
> > > --- a/include/uapi/drm/msm_drm.h
> > > +++ b/include/uapi/drm/msm_drm.h
> > > @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
> > >   * backwards compatibility as a "default" submitqueue
> > >   */
> > >
> > > -#define MSM_SUBMITQUEUE_FLAGS (0)
> > > +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT0x0001
> > > +#define MSM_SUBMITQUEUE_FLAGS( \
> > > + MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> > > + 0)
> > >
> > >  /*
> > >   * The submitqueue priority should be between 0 and 
> > > MSM_PARAM_PRIORITIES-1,
> > >
> > > --
> > > 2.46.0
> > >
> > >


Re: [PATCH 7/7] drm/msm/A6xx: Enable preemption for A7xx targets

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:17PM +0200, Antonino Maniscalco wrote:
> Initialize with 4 rings to enable preemption.
> 
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 86357016db8d..dfcbe08f2161 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2598,7 +2598,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>   }
>  
>   if (is_a7xx)
> - ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
> + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4);

Ideally, we should test each a7x target before enabling preemption
support. We don't know for sure if the save-restore list is accurate or the 
firmware
used has all the necessary support for preemption.

-Akhil.

>   else if (adreno_has_gmu_wrapper(adreno_gpu))
>   ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 
> 1);
>   else
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH 6/7] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:16PM +0200, Antonino Maniscalco wrote:
> Some userspace changes are necessary so add a flag for userspace to
> advertise support for preemption.

So the intention is to fallback to level 0 preemption until user moves
to Mesa libs with level 1 support for each new GPU? Please elaborate a bit.

-Akhil.

> 
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 
>  include/uapi/drm/msm_drm.h|  5 -
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 1a90db5759b8..86357016db8d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -453,8 +453,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   OUT_PKT7(ring, CP_SET_MARKER, 1);
>   OUT_RING(ring, 0x101); /* IFPC disable */
>  
> - OUT_PKT7(ring, CP_SET_MARKER, 1);
> - OUT_RING(ring, 0x00d); /* IB1LIST start */
> + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> + OUT_PKT7(ring, CP_SET_MARKER, 1);
> + OUT_RING(ring, 0x00d); /* IB1LIST start */
> + }
>  
>   /* Submit the commands */
>   for (i = 0; i < submit->nr_cmds; i++) {
> @@ -485,8 +487,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   update_shadow_rptr(gpu, ring);
>   }
>  
> - OUT_PKT7(ring, CP_SET_MARKER, 1);
> - OUT_RING(ring, 0x00e); /* IB1LIST end */
> + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> + OUT_PKT7(ring, CP_SET_MARKER, 1);
> + OUT_RING(ring, 0x00e); /* IB1LIST end */
> + }
>  
>   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
>   rbmemptr_stats(ring, index, cpcycles_end));
> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> index 3fca72f73861..f37858db34e6 100644
> --- a/include/uapi/drm/msm_drm.h
> +++ b/include/uapi/drm/msm_drm.h
> @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
>   * backwards compatibility as a "default" submitqueue
>   */
>  
> -#define MSM_SUBMITQUEUE_FLAGS (0)
> +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT0x0001
> +#define MSM_SUBMITQUEUE_FLAGS( \
> + MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> + 0)
>  
>  /*
>   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH 5/7] drm/msm/A6xx: Add traces for preemption

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:15PM +0200, Antonino Maniscalco wrote:
> Add trace points corresponding to preemption being triggered and being
> completed for latency measurement purposes.
> 
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c |  7 +++
>  drivers/gpu/drm/msm/msm_gpu_trace.h   | 28 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index 0d402a3bcf5a..2606835f3c6d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -7,6 +7,7 @@
>  #include "a6xx_gpu.h"
>  #include "a6xx_gmu.xml.h"
>  #include "msm_mmu.h"
> +#include "msm_gpu_trace.h"
>  
>  #define FENCE_STATUS_WRITEDROPPED0_MASK 0x1
>  #define FENCE_STATUS_WRITEDROPPED1_MASK 0x2

These seems unused in this file. I think the previous patch added this.

> @@ -146,6 +147,8 @@ void a6xx_preempt_irq(struct msm_gpu *gpu)
>  
>   set_preempt_state(a6xx_gpu, PREEMPT_NONE);
>  
> + trace_msm_gpu_preemption_irq(a6xx_gpu->cur_ring->id);
> +
>   /*
>* Retrigger preemption to avoid a deadlock that might occur when 
> preemption
>* is skipped due to it being already in flight when requested.
> @@ -262,6 +265,10 @@ void a6xx_preempt_trigger(struct msm_gpu *gpu)
>*/
>   ring->skip_inline_wptr = false;
>  
> + trace_msm_gpu_preemption_trigger(
> + a6xx_gpu->cur_ring ? a6xx_gpu->cur_ring->id : -1,

Can't we avoid this check? I mean GPU has initialized on one of the
RBs at this point.

-Akhil.

> + ring ? ring->id : -1);
> +
>   spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  
>   gpu_write64(gpu,
> diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
> b/drivers/gpu/drm/msm/msm_gpu_trace.h
> index ac40d857bc45..7f863282db0d 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_trace.h
> +++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
> @@ -177,6 +177,34 @@ TRACE_EVENT(msm_gpu_resume,
>   TP_printk("%u", __entry->dummy)
>  );
>  
> +TRACE_EVENT(msm_gpu_preemption_trigger,
> + TP_PROTO(int ring_id_from, int ring_id_to),
> + TP_ARGS(ring_id_from, ring_id_to),
> + TP_STRUCT__entry(
> + __field(int, ring_id_from)
> + __field(int, ring_id_to)
> + ),
> + TP_fast_assign(
> + __entry->ring_id_from = ring_id_from;
> + __entry->ring_id_to = ring_id_to;
> + ),
> + TP_printk("preempting %u -> %u",
> +   __entry->ring_id_from,
> +   __entry->ring_id_to)
> +);
> +
> +TRACE_EVENT(msm_gpu_preemption_irq,
> + TP_PROTO(u32 ring_id),
> + TP_ARGS(ring_id),
> + TP_STRUCT__entry(
> + __field(u32, ring_id)
> + ),
> + TP_fast_assign(
> + __entry->ring_id = ring_id;
> + ),
> + TP_printk("preempted to %u", __entry->ring_id)
> +);
> +
>  #endif
>  
>  #undef TRACE_INCLUDE_PATH
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
> 
> Signed-off-by: Sharat Masetty 
> Signed-off-by: Antonino Maniscalco 
> ---

No postamble packets which resets perfcounters? It is necessary. Also, I
think we should disable preemption during profiling like we disable slumber.

-Akhil.

>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 434 
> ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
>  5 files changed, 924 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index f5e2838c6a76..32e915109a59 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -23,6 +23,7 @@ adreno-y := \
>   adreno/a6xx_gpu.o \
>   adreno/a6xx_gmu.o \
>   adreno/a6xx_hfi.o \
> + adreno/a6xx_preempt.o \
>  
>  adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 32a4faa93d7f..1a90db5759b8 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -16,6 +16,83 @@
>  
>  #define GPU_PAS_ID 13
>  
> +/* IFPC & Preemption static powerup restore list */
> +static const uint32_t a7xx_pwrup_reglist[] = {
> + REG_A6XX_UCHE_TRAP_BASE,
> + REG_A6XX_UCHE_TRAP_BASE + 1,
> + REG_A6XX_UCHE_WRITE_THRU_BASE,
> + REG_A6XX_UCHE_WRITE_THRU_BASE + 1,
> + REG_A6XX_UCHE_GMEM_RANGE_MIN,
> + REG_A6XX_UCHE_GMEM_RANGE_MIN + 1,
> + REG_A6XX_UCHE_GMEM_RANGE_MAX,
> + REG_A6XX_UCHE_GMEM_RANGE_MAX + 1,
> + REG_A6XX_UCHE_CACHE_WAYS,
> + REG_A6XX_UCHE_MODE_CNTL,
> + REG_A6XX_RB_NC_MODE_CNTL,
> + REG_A6XX_RB_CMP_DBG_ECO_CNTL,
> + REG_A7XX_GRAS_NC_MODE_CNTL,
> + REG_A6XX_RB_CONTEXT_SWITCH_GMEM_SAVE_RESTORE,
> + REG_A6XX_UCHE_GBIF_GX_CONFIG,
> + REG_A6XX_UCHE_CLIENT_PF,
> +};
> +
> +static const uint32_t a7xx_ifpc_pwrup_reglist[] = {
> + REG_A6XX_TPL1_NC_MODE_CNTL,
> + REG_A6XX_SP_NC_MODE_CNTL,
> + REG_A6XX_CP_DBG_ECO_CNTL,
> + REG_A6XX_CP_PROTECT_CNTL,
> + REG_A6XX_CP_PROTECT(0),
> + REG_A6XX_CP_PROTECT(1),
> + REG_A6XX_CP_PROTECT(2),
> + REG_A6XX_CP_PROTECT(3),
> + REG_A6XX_CP_PROTECT(4),
> + REG_A6XX_CP_PROTECT(5),
> + REG_A6XX_CP_PROTECT(6),
> + REG_A6XX_CP_PROTECT(7),
> + REG_A6XX_CP_PROTECT(8),
> + REG_A6XX_CP_PROTECT(9),
> + REG_A6XX_CP_PROTECT(10),
> + REG_A6XX_CP_PROTECT(11),
> + REG_A6XX_CP_PROTECT(12),
> + REG_A6XX_CP_PROTECT(13),
> + REG_A6XX_CP_PROTECT(14),
> + REG_A6XX_CP_PROTECT(15),
> + REG_A6XX_CP_PROTECT(16),
> + REG_A6XX_CP_PROTECT(17),
> + REG_A6XX_CP_PROTECT(18),
> + REG_A6XX_CP_PROTECT(19),
> + REG_A6XX_CP_PROTECT(20),
> + REG_A6XX_CP_PROTECT(21),
> + REG_A6XX_CP_PROTECT(22),
> + REG_A6XX_CP_PROTECT(23),
> + REG_A6XX_CP_PROTECT(24),
> + REG_A6XX_CP_PROTECT(25),
> + REG_A6XX_CP_PROTECT(26),
> + REG_A6XX_CP_PROTECT(27),
> + REG_A6XX_CP_PROTECT(28),
> + REG_A6XX_CP_PROTECT(29),
> + REG_A6XX_CP_PROTECT(30),
> + REG_A6XX_CP_PROTECT(31),
> + REG_A6XX_CP_PROTECT(32),
> + REG_A6XX_CP_PROTECT(33),
> + REG_A6XX_CP_PROTECT(34),
> + REG_A6XX_CP_PROTECT(35),
> + REG_A6XX_CP_PROTECT(36),
> + REG_A6XX_CP_PROTECT(37),
> + REG_A6XX_CP_PROTECT(38),
> + REG_A6XX_CP_PROTECT(39),
> + REG_A6XX_CP_PROTECT(40),
> + REG_A6XX_CP_PROTECT(41),
> + REG_A6XX_CP_PROTECT(42),
> + REG_A6XX_CP_PROTECT(43),
> + REG_A6XX_CP_PROTECT(44),
> + REG_A6XX_CP_PROTECT(45),
> + REG_A6XX_CP_PROTECT(46),
> + REG_A6XX_CP_PROTECT(47),
> + REG_A6XX_CP_AHB_CNTL,
> +};
> +
> +
>  static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>  {
>   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -68,6 +145,8 @@ static void update_shadow_rptr(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>  
>  static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>  {
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   uint32_t wptr;
>   unsigned long flags;
>  
> @@ -81,12 +160,26 @@ static void a6xx_flush(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>   /* Make sure to wrap

Re: [PATCH 3/7] drm/msm: Add a `preempt_record_size` field

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:13PM +0200, Antonino Maniscalco wrote:
> Adds a field to `adreno_info` to store the GPU specific preempt record
> size.
> 
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> index 68ba9aed5506..4cee54d57646 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -1190,6 +1190,7 @@ static const struct adreno_info a7xx_gpus[] = {
>   .protect = &a730_protect,
>   },
>   .address_space_size = SZ_16G,
> + .preempt_record_size = 2860 * SZ_1K,
>   }, {
>   .chip_ids = ADRENO_CHIP_IDS(0x43050a01), /* "C510v2" */
>   .family = ADRENO_7XX_GEN2,
> @@ -1209,6 +1210,7 @@ static const struct adreno_info a7xx_gpus[] = {
>   .gmu_chipid = 0x7020100,
>   },
>   .address_space_size = SZ_16G,
> + .preempt_record_size = 4192 * SZ_1K,
>   }, {
>   .chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" */

We can use 4192KB for X185. With that,

Reviewed-by: Akhil P Oommen 

-Akhil.

>   .family = ADRENO_7XX_GEN2,
> @@ -1245,6 +1247,7 @@ static const struct adreno_info a7xx_gpus[] = {
>   .gmu_chipid = 0x7090100,
>   },
>   .address_space_size = SZ_16G,
> + .preempt_record_size = 3572 * SZ_1K,
>   }
>  };
>  DECLARE_ADRENO_GPULIST(a7xx);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 1ab523a163a0..6b1888280a83 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -111,6 +111,7 @@ struct adreno_info {
>* {SHRT_MAX, 0} sentinal.
>*/
>   struct adreno_speedbin *speedbins;
> + u64 preempt_record_size;
>  };
>  
>  #define ADRENO_CHIP_IDS(tbl...) (uint32_t[]) { tbl, 0 }
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH 2/7] drm/msm: Add submitqueue setup and close

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:12PM +0200, Antonino Maniscalco wrote:
> This patch adds a bit of infrastructure to give the different Adreno
> targets the flexibility to setup the submitqueues per their needs.
> 
> Signed-off-by: Sharat Masetty 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/msm_gpu.h |  7 +++
>  drivers/gpu/drm/msm/msm_submitqueue.c | 10 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 1f02bb9956be..70f5c18e5aee 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -92,6 +92,10 @@ struct msm_gpu_funcs {
>* for cmdstream that is buffered in this FIFO upstream of the CP fw.
>*/
>   bool (*progress)(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
> + int (*submitqueue_setup)(struct msm_gpu *gpu,
> + struct msm_gpu_submitqueue *queue);
> + void (*submitqueue_close)(struct msm_gpu *gpu,
> + struct msm_gpu_submitqueue *queue);
>  };
>  
>  /* Additional state for iommu faults: */
> @@ -522,6 +526,9 @@ struct msm_gpu_submitqueue {
>   struct mutex lock;
>   struct kref ref;
>   struct drm_sched_entity *entity;
> + struct msm_gpu *gpu;
> + struct drm_gem_object *bo;
> + uint64_t bo_iova;
>  };
>  
>  struct msm_gpu_state_bo {
> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
> b/drivers/gpu/drm/msm/msm_submitqueue.c
> index 0e803125a325..4ffb336d9a60 100644
> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> @@ -71,6 +71,11 @@ void msm_submitqueue_destroy(struct kref *kref)
>   struct msm_gpu_submitqueue *queue = container_of(kref,
>   struct msm_gpu_submitqueue, ref);
>  
> + struct msm_gpu *gpu = queue->gpu;
> +
> + if (gpu && gpu->funcs->submitqueue_close)
> + gpu->funcs->submitqueue_close(gpu, queue);
> +
>   idr_destroy(&queue->fence_idr);
>  
>   msm_file_private_put(queue->ctx);
> @@ -160,6 +165,7 @@ int msm_submitqueue_create(struct drm_device *drm, struct 
> msm_file_private *ctx,
>  {
>   struct msm_drm_private *priv = drm->dev_private;
>   struct msm_gpu_submitqueue *queue;
> + struct msm_gpu *gpu = priv->gpu;
>   enum drm_sched_priority sched_prio;
>   unsigned ring_nr;
>   int ret;
> @@ -195,6 +201,7 @@ int msm_submitqueue_create(struct drm_device *drm, struct 
> msm_file_private *ctx,
>  
>   queue->ctx = msm_file_private_get(ctx);
>   queue->id = ctx->queueid++;
> + queue->gpu = gpu;
>  
>   if (id)
>   *id = queue->id;
> @@ -207,6 +214,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct 
> msm_file_private *ctx,
>  
>   write_unlock(&ctx->queuelock);
>  
> + if (gpu && gpu->funcs->submitqueue_setup)
> + gpu->funcs->submitqueue_setup(gpu, queue);
> +
>   return 0;
>  }
>  
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH 1/7] drm/msm: Fix bv_fence being used as bv_rptr

2024-08-19 Thread Akhil P Oommen
On Thu, Aug 15, 2024 at 08:26:11PM +0200, Antonino Maniscalco wrote:
> The bv_fence field of rbmemptrs was being used incorrectly as the BV
> rptr shadow pointer in some places.
> 
> Add a bv_rptr field and change the code to use that instead.
> 
> Signed-off-by: Antonino Maniscalco 

Reviewed-by: Akhil P Oommen 

-Akhil.

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.h  | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bcaec86ac67a..32a4faa93d7f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1132,7 +1132,7 @@ static int hw_init(struct msm_gpu *gpu)
>   /* ..which means "always" on A7xx, also for BV shadow */
>   if (adreno_is_a7xx(adreno_gpu)) {
>   gpu_write64(gpu, REG_A7XX_CP_BV_RB_RPTR_ADDR,
> - rbmemptr(gpu->rb[0], bv_fence));
> + rbmemptr(gpu->rb[0], bv_rptr));
>   }
>  
>   /* Always come up on rb 0 */
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
> b/drivers/gpu/drm/msm/msm_ringbuffer.h
> index 0d6beb8cd39a..40791b2ade46 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.h
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
> @@ -31,6 +31,7 @@ struct msm_rbmemptrs {
>   volatile uint32_t rptr;
>   volatile uint32_t fence;
>   /* Introduced on A7xx */
> + volatile uint32_t bv_rptr;
>   volatile uint32_t bv_fence;
>  
>   volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
> 
> -- 
> 2.46.0
> 
> 


Re: [PATCH v2 1/3] drm/msm: Use a7xx family directly in gpu_state

2024-08-12 Thread Akhil P Oommen
On Mon, Aug 12, 2024 at 07:25:14PM +0100, Connor Abbott wrote:
> On Mon, Aug 12, 2024 at 7:09 AM Akhil P Oommen  
> wrote:
> >
> > On Wed, Aug 07, 2024 at 01:34:27PM +0100, Connor Abbott wrote:
> > > With a7xx, we need to import a new header for each new generation and
> > > switch to a different list of registers, instead of making
> > > backwards-compatible changes. Using the helpers inadvertently made a750
> > > use the a740 list of registers, instead use the family directly to fix
> > > this.
> >
> > This won't scale. What about other gpus in the same generation but has a
> > different register list? You don't see that issue currently because
> > there are no support for lower tier a7x GPUs yet.
> 
> GPUs in the same generation always have the same register list. e.g.
> gen7_4_0 has the same register list as gen7_0_0. kgsl has already
> moved onto gen8 which redoes everything again and will require a
> separate codepath, they only have one more gen7 register list compared
> to us, and I doubt they'll add many more. So the kgsl approach would
> be pointless over-engineering.

https://git.codelinaro.org/clo/la/platform/vendor/qcom/opensource/graphics-kernel/-/tree/gfx-kernel.lnx.1.0.r48-rel?ref_type=heads

Not sure if there is another more recent public facing kgsl code than this
one, but at least this lists 2 more snapshot headers we will have to
consider in future. And there are other a7x GPUs and a8x (even though a8x
should be a separate HWL, it is good to have a similar code structure).

I am not saying you should do the extra engineering at this point, but I don't
think we should move things in the other direction.

-Akhil

> 
> Connor
> 
> >
> > I think we should move to a "snapshot block list" like in the downstream
> > driver if you want to simplify the whole logic. Otherwise, we should
> > leave the chipid check as it is and just fix up a750 configurations.
> >
> > -Akhil
> >
> > >
> > > Fixes: f3f8207d8aed ("drm/msm: Add devcoredump support for a750")
> > > Signed-off-by: Connor Abbott 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 
> > > ++---
> > >  1 file changed, 20 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > index 77146d30bcaa..c641ee7dec78 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > @@ -390,18 +390,18 @@ static void a7xx_get_debugbus_blocks(struct msm_gpu 
> > > *gpu,
> > >   const u32 *debugbus_blocks, *gbif_debugbus_blocks;
> > >   int i;
> > >
> > > - if (adreno_is_a730(adreno_gpu)) {
> > > + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
> > >   debugbus_blocks = gen7_0_0_debugbus_blocks;
> > >   debugbus_blocks_count = 
> > > ARRAY_SIZE(gen7_0_0_debugbus_blocks);
> > >   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
> > >   gbif_debugbus_blocks_count = 
> > > ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
> > > - } else if (adreno_is_a740_family(adreno_gpu)) {
> > > + } else if (adreno_gpu->info->family == ADRENO_7XX_GEN2) {
> > >   debugbus_blocks = gen7_2_0_debugbus_blocks;
> > >   debugbus_blocks_count = 
> > > ARRAY_SIZE(gen7_2_0_debugbus_blocks);
> > >   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
> > >   gbif_debugbus_blocks_count = 
> > > ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
> > >   } else {
> > > - BUG_ON(!adreno_is_a750(adreno_gpu));
> > > + BUG_ON(adreno_gpu->info->family != ADRENO_7XX_GEN3);
> > >   debugbus_blocks = gen7_9_0_debugbus_blocks;
> > >   debugbus_blocks_count = 
> > > ARRAY_SIZE(gen7_9_0_debugbus_blocks);
> > >   gbif_debugbus_blocks = gen7_9_0_gbif_debugbus_blocks;
> > > @@ -511,7 +511,7 @@ static void a6xx_get_debugbus(struct msm_gpu *gpu,
> > >   const struct a6xx_debugbus_block *cx_debugbus_blocks;
> > >
> > >   if (adreno_is_a7xx(adreno_gpu)) {
> > > - BUG_ON(!(adreno_is_a730(adreno_gpu) || 
> > > adreno_is_a740_family(adreno_gpu)));
> > > + BUG_ON(adreno_gpu->info->family > ADRENO_7XX_GEN3);
> > >

Re: [PATCH v2 3/3] drm/msm: Fix CP_BV_DRAW_STATE_ADDR name

2024-08-11 Thread Akhil P Oommen
On Wed, Aug 07, 2024 at 01:34:29PM +0100, Connor Abbott wrote:
> This was missed because we weren't using the a750-specific indexed regs.
> 
> Fixes: f3f8207d8aed ("drm/msm: Add devcoredump support for a750")
> Signed-off-by: Connor Abbott 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h 
> b/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h
> index 260d66eccfec..9a327d543f27 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h
> @@ -1303,7 +1303,7 @@ static struct a6xx_indexed_registers 
> gen7_9_0_cp_indexed_reg_list[] = {
>   REG_A6XX_CP_ROQ_DBG_DATA, 0x00800},
>   { "CP_UCODE_DBG_DATA", REG_A6XX_CP_SQE_UCODE_DBG_ADDR,
>   REG_A6XX_CP_SQE_UCODE_DBG_DATA, 0x08000},
> - { "CP_BV_SQE_STAT_ADDR", REG_A7XX_CP_BV_DRAW_STATE_ADDR,
> + { "CP_BV_DRAW_STATE_ADDR", REG_A7XX_CP_BV_DRAW_STATE_ADDR,
>   REG_A7XX_CP_BV_DRAW_STATE_DATA, 0x00200},
>   { "CP_BV_ROQ_DBG_ADDR", REG_A7XX_CP_BV_ROQ_DBG_ADDR,
>   REG_A7XX_CP_BV_ROQ_DBG_DATA, 0x00800},
> 
> -- 
> 2.31.1
> 
> 


Re: [PATCH v2 1/3] drm/msm: Use a7xx family directly in gpu_state

2024-08-11 Thread Akhil P Oommen
On Wed, Aug 07, 2024 at 01:34:27PM +0100, Connor Abbott wrote:
> With a7xx, we need to import a new header for each new generation and
> switch to a different list of registers, instead of making
> backwards-compatible changes. Using the helpers inadvertently made a750
> use the a740 list of registers, instead use the family directly to fix
> this.

This won't scale. What about other gpus in the same generation but has a
different register list? You don't see that issue currently because
there are no support for lower tier a7x GPUs yet.

I think we should move to a "snapshot block list" like in the downstream
driver if you want to simplify the whole logic. Otherwise, we should
leave the chipid check as it is and just fix up a750 configurations.

-Akhil

> 
> Fixes: f3f8207d8aed ("drm/msm: Add devcoredump support for a750")
> Signed-off-by: Connor Abbott 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 
> ++---
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> index 77146d30bcaa..c641ee7dec78 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> @@ -390,18 +390,18 @@ static void a7xx_get_debugbus_blocks(struct msm_gpu 
> *gpu,
>   const u32 *debugbus_blocks, *gbif_debugbus_blocks;
>   int i;
>  
> - if (adreno_is_a730(adreno_gpu)) {
> + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
>   debugbus_blocks = gen7_0_0_debugbus_blocks;
>   debugbus_blocks_count = ARRAY_SIZE(gen7_0_0_debugbus_blocks);
>   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
>   gbif_debugbus_blocks_count = 
> ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
> - } else if (adreno_is_a740_family(adreno_gpu)) {
> + } else if (adreno_gpu->info->family == ADRENO_7XX_GEN2) {
>   debugbus_blocks = gen7_2_0_debugbus_blocks;
>   debugbus_blocks_count = ARRAY_SIZE(gen7_2_0_debugbus_blocks);
>   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
>   gbif_debugbus_blocks_count = 
> ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
>   } else {
> - BUG_ON(!adreno_is_a750(adreno_gpu));
> + BUG_ON(adreno_gpu->info->family != ADRENO_7XX_GEN3);
>   debugbus_blocks = gen7_9_0_debugbus_blocks;
>   debugbus_blocks_count = ARRAY_SIZE(gen7_9_0_debugbus_blocks);
>   gbif_debugbus_blocks = gen7_9_0_gbif_debugbus_blocks;
> @@ -511,7 +511,7 @@ static void a6xx_get_debugbus(struct msm_gpu *gpu,
>   const struct a6xx_debugbus_block *cx_debugbus_blocks;
>  
>   if (adreno_is_a7xx(adreno_gpu)) {
> - BUG_ON(!(adreno_is_a730(adreno_gpu) || 
> adreno_is_a740_family(adreno_gpu)));
> + BUG_ON(adreno_gpu->info->family > ADRENO_7XX_GEN3);
>   cx_debugbus_blocks = a7xx_cx_debugbus_blocks;
>   nr_cx_debugbus_blocks = 
> ARRAY_SIZE(a7xx_cx_debugbus_blocks);
>   } else {
> @@ -662,11 +662,11 @@ static void a7xx_get_dbgahb_clusters(struct msm_gpu 
> *gpu,
>   const struct gen7_sptp_cluster_registers *dbgahb_clusters;
>   unsigned dbgahb_clusters_size;
>  
> - if (adreno_is_a730(adreno_gpu)) {
> + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
>   dbgahb_clusters = gen7_0_0_sptp_clusters;
>   dbgahb_clusters_size = ARRAY_SIZE(gen7_0_0_sptp_clusters);
>   } else {
> - BUG_ON(!adreno_is_a740_family(adreno_gpu));
> + BUG_ON(adreno_gpu->info->family > ADRENO_7XX_GEN3);
>   dbgahb_clusters = gen7_2_0_sptp_clusters;
>   dbgahb_clusters_size = ARRAY_SIZE(gen7_2_0_sptp_clusters);
>   }
> @@ -820,14 +820,14 @@ static void a7xx_get_clusters(struct msm_gpu *gpu,
>   const struct gen7_cluster_registers *clusters;
>   unsigned clusters_size;
>  
> - if (adreno_is_a730(adreno_gpu)) {
> + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
>   clusters = gen7_0_0_clusters;
>   clusters_size = ARRAY_SIZE(gen7_0_0_clusters);
> - } else if (adreno_is_a740_family(adreno_gpu)) {
> + } else if (adreno_gpu->info->family == ADRENO_7XX_GEN2) {
>   clusters = gen7_2_0_clusters;
>   clusters_size = ARRAY_SIZE(gen7_2_0_clusters);
>   } else {
> - BUG_ON(!adreno_is_a750(adreno_gpu));
> + BUG_ON(adreno_gpu->info->family != ADRENO_7XX_GEN3);
>   clusters = gen7_9_0_clusters;
>   clusters_size = ARRAY_SIZE(gen7_9_0_clusters);
>   }
> @@ -895,7 +895,7 @@ static void a7xx_get_shader_block(struct msm_gpu *gpu,
>   if (WARN_ON(datasize > A6XX_CD_DATA_SIZE))
>   return;
>  
> - if (adreno_is_a730(adreno_gpu)) {
> + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
>   gp

Re: [PATCH 4/4] drm/msm/a5xx: workaround early ring-buffer emptiness check

2024-08-05 Thread Akhil P Oommen
On Thu, Jul 11, 2024 at 10:00:21AM +, Vladimir Lypak wrote:
> There is another cause for soft lock-up of GPU in empty ring-buffer:
> race between GPU executing last commands and CPU checking ring for
> emptiness. On GPU side IRQ for retire is triggered by CACHE_FLUSH_TS
> event and RPTR shadow (which is used to check ring emptiness) is updated
> a bit later from CP_CONTEXT_SWITCH_YIELD. Thus if GPU is executing its
> last commands slow enough or we check that ring too fast we will miss a
> chance to trigger switch to lower priority ring because current ring isn't
> empty just yet. This can escalate to lock-up situation described in
> previous patch.
> To work-around this issue we keep track of last submit sequence number
> for each ring and compare it with one written to memptrs from GPU during
> execution of CACHE_FLUSH_TS event.
> 

This is interesting! Is this just theoretical or are you able to hit
this race on your device (after picking other fixes in this series)?

-Akhil.

> Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
> Signed-off-by: Vladimir Lypak 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h | 1 +
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 4 
>  3 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 266744ee1d5f..001f11f5febc 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -65,6 +65,8 @@ void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer 
> *ring,
>  
>  static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit 
> *submit)
>  {
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
>   struct msm_ringbuffer *ring = submit->ring;
>   struct drm_gem_object *obj;
>   uint32_t *ptr, dwords;
> @@ -109,6 +111,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit
>   }
>   }
>  
> + a5xx_gpu->last_seqno[ring->id] = submit->seqno;
>   a5xx_flush(gpu, ring, true);
>   a5xx_preempt_trigger(gpu, true);
>  
> @@ -210,6 +213,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   /* Write the fence to the scratch register */
>   OUT_PKT4(ring, REG_A5XX_CP_SCRATCH_REG(2), 1);
>   OUT_RING(ring, submit->seqno);
> + a5xx_gpu->last_seqno[ring->id] = submit->seqno;
>  
>   /*
>* Execute a CACHE_FLUSH_TS event. This will ensure that the
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> index 1120824853d4..7269eaab9a7a 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> @@ -34,6 +34,7 @@ struct a5xx_gpu {
>   struct drm_gem_object *preempt_counters_bo[MSM_GPU_MAX_RINGS];
>   struct a5xx_preempt_record *preempt[MSM_GPU_MAX_RINGS];
>   uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
> + uint32_t last_seqno[MSM_GPU_MAX_RINGS];
>  
>   atomic_t preempt_state;
>   struct timer_list preempt_timer;
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> index f8d09a83c5ae..6bd92f9b2338 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> @@ -55,6 +55,8 @@ static inline void update_wptr(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>  /* Return the highest priority ringbuffer with something in it */
>  static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
>  {
> + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> + struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
>   unsigned long flags;
>   int i;
>  
> @@ -64,6 +66,8 @@ static struct msm_ringbuffer *get_next_ring(struct msm_gpu 
> *gpu)
>  
>   spin_lock_irqsave(&ring->preempt_lock, flags);
>   empty = (get_wptr(ring) == gpu->funcs->get_rptr(gpu, ring));
> + if (!empty && ring == a5xx_gpu->cur_ring)
> + empty = ring->memptrs->fence == a5xx_gpu->last_seqno[i];
>   spin_unlock_irqrestore(&ring->preempt_lock, flags);
>  
>   if (!empty)
> -- 
> 2.45.2
> 


Re: [PATCH 3/4] drm/msm/a5xx: fix races in preemption evaluation stage

2024-08-05 Thread Akhil P Oommen
On Thu, Jul 11, 2024 at 10:00:20AM +, Vladimir Lypak wrote:
> On A5XX GPUs when preemption is used it's invietable to enter a soft
> lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
> This appears as full UI lockup and not detected as GPU hang (because
> it's not). This happens due to not triggering preemption when it was
> needed. Sometimes this state can be recovered by some new submit but
> generally it won't happen because applications are waiting for old
> submits to retire.
> 
> One of the reasons why this happens is a race between a5xx_submit and
> a5xx_preempt_trigger called from IRQ during submit retire. Former thread
> updates ring->cur of previously empty and not current ring right after
> latter checks it for emptiness. Then both threads can just exit because
> for first one preempt_state wasn't NONE yet and for second one all rings
> appeared to be empty.
> 
> To prevent such situations from happening we need to establish guarantee
> for preempt_trigger to be called after each submit. To implement it this
> patch adds trigger call at the end of a5xx_preempt_irq to re-check if we
> should switch to non-empty or higher priority ring. Also we find next
> ring in new preemption state "EVALUATE". If the thread that updated some
> ring with new submit sees this state it should wait until it passes.
> 
> Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
> Signed-off-by: Vladimir Lypak 

I didn't go through the other thread with Connor completely, but can you
please check if the below chunk is enough instead of this patch?

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c 
b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index f58dd564d122..d69b14ebbe44 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -47,9 +47,8 @@ static inline void update_wptr(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring)

spin_lock_irqsave(&ring->preempt_lock, flags);
wptr = get_wptr(ring);
-   spin_unlock_irqrestore(&ring->preempt_lock, flags);
-
gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
+   spin_unlock_irqrestore(&ring->preempt_lock, flags);
 }

 /* Return the highest priority ringbuffer with something in it */
@@ -188,6 +187,8 @@ void a5xx_preempt_irq(struct msm_gpu *gpu)
update_wptr(gpu, a5xx_gpu->cur_ring);

set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+
+   a5xx_preempt_trigger(gpu);
 }

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  6 +++---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h | 11 +++
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 24 +++
>  3 files changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 6c80d3003966..266744ee1d5f 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -110,7 +110,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit
>   }
>  
>   a5xx_flush(gpu, ring, true);
> - a5xx_preempt_trigger(gpu);
> + a5xx_preempt_trigger(gpu, true);
>  
>   /* we might not necessarily have a cmd from userspace to
>* trigger an event to know that submit has completed, so
> @@ -240,7 +240,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   a5xx_flush(gpu, ring, false);
>  
>   /* Check to see if we need to start preemption */
> - a5xx_preempt_trigger(gpu);
> + a5xx_preempt_trigger(gpu, true);
>  }
>  
>  static const struct adreno_five_hwcg_regs {
> @@ -1296,7 +1296,7 @@ static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
>   a5xx_gpmu_err_irq(gpu);
>  
>   if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS) {
> - a5xx_preempt_trigger(gpu);
> + a5xx_preempt_trigger(gpu, false);
>   msm_gpu_retire(gpu);
>   }
>  
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> index c7187bcc5e90..1120824853d4 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
> @@ -57,10 +57,12 @@ void a5xx_debugfs_init(struct msm_gpu *gpu, struct 
> drm_minor *minor);
>   * through the process.
>   *
>   * PREEMPT_NONE - no preemption in progress.  Next state START.
> - * PREEMPT_START - The trigger is evaulating if preemption is possible. Next
> - * states: TRIGGERED, NONE
> + * PREEMPT_EVALUATE - The trigger is evaulating if preemption is possible. 
> Next
> + * states: START, ABORT
>   * PREEMPT_ABORT - An intermediate state before moving back to NONE. Next
>   * state: NONE.
> + * PREEMPT_START - The trigger is preparing for preemption. Next state:
> + * TRIGGERED
>   * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
>   * states: FAULTED, PENDING
>   * PREEMPT_FAULTED: A preemption timed out (never completed).

Re: [PATCH 2/4] drm/msm/a5xx: properly clear preemption records on resume

2024-08-05 Thread Akhil P Oommen
On Fri, Aug 02, 2024 at 01:41:32PM +, Vladimir Lypak wrote:
> On Thu, Aug 01, 2024 at 06:46:10PM +0530, Akhil P Oommen wrote:
> > On Thu, Jul 11, 2024 at 10:00:19AM +, Vladimir Lypak wrote:
> > > Two fields of preempt_record which are used by CP aren't reset on
> > > resume: "data" and "info". This is the reason behind faults which happen
> > > when we try to switch to the ring that was active last before suspend.
> > > In addition those faults can't be recovered from because we use suspend
> > > and resume to do so (keeping values of those fields again).
> > > 
> > > Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
> > > Signed-off-by: Vladimir Lypak 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c 
> > > b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> > > index f58dd564d122..67a8ef4adf6b 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> > > @@ -204,6 +204,8 @@ void a5xx_preempt_hw_init(struct msm_gpu *gpu)
> > >   return;
> > >  
> > >   for (i = 0; i < gpu->nr_rings; i++) {
> > > + a5xx_gpu->preempt[i]->data = 0;
> > > + a5xx_gpu->preempt[i]->info = 0;
> > 
> > I don't see this bit in the downstream driver. Just curious, do we need
> > to clear both fields to avoid the gpu faults?
> 
> Downstream gets away without doing so because it resumes on the same
> ring that it suspended on. On mainline we always do GPU resume on first
> ring. It was enough to zero info field to avoid faults but clearing
> both shouldn't hurt.
> 
> I have tried to replicate faults again with local preemption disabled
> and unmodified mesa and couldn't do so. It only happens when fine-grain
> preemption is used and there was a switch from IB1.

So, I guess gpu is going to rpm suspend while there is pending
(preempted) submits present in another ringbuffer. Probably the other
fixes you have in this series make this not necessary during rpm suspend.
But we can keep as it is harmless and might help during gpu recovery.

> This made me come up with explanation of what could be happening.
> If preemption switch is initiated on a some ring at checkpoint in IB1,
> CP should save position of that checkpoint in the preemption record and
> set some flag in "info" field which will tell it to continue from that
> checkpoint when switching back.
> When switching back to that ring we program address of its preemption
> record to CP_CONTEXT_SWITCH_RESTORE_ADDR. Apparently this won't remove
> the flag from "info" field because the preemption record is only being
> read from. This leaves preemption record outdated on that ring until
> next switch will override it. This doesn't cause issues on downstream
> because it won't try to restore from that record since it's ignored
> during GPU power-up.

I guess it is fine if you never go to rpm suspend without idling all
RBs!

-Akhil

> 
> Vladimir
> 
> > 
> > -Akhil
> > >   a5xx_gpu->preempt[i]->wptr = 0;
> > >   a5xx_gpu->preempt[i]->rptr = 0;
> > >   a5xx_gpu->preempt[i]->rbase = gpu->rb[i]->iova;
> > > -- 
> > > 2.45.2
> > > 


Re: [PATCH 2/4] drm/msm/a5xx: properly clear preemption records on resume

2024-08-01 Thread Akhil P Oommen
On Thu, Jul 11, 2024 at 10:00:19AM +, Vladimir Lypak wrote:
> Two fields of preempt_record which are used by CP aren't reset on
> resume: "data" and "info". This is the reason behind faults which happen
> when we try to switch to the ring that was active last before suspend.
> In addition those faults can't be recovered from because we use suspend
> and resume to do so (keeping values of those fields again).
> 
> Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
> Signed-off-by: Vladimir Lypak 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> index f58dd564d122..67a8ef4adf6b 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> @@ -204,6 +204,8 @@ void a5xx_preempt_hw_init(struct msm_gpu *gpu)
>   return;
>  
>   for (i = 0; i < gpu->nr_rings; i++) {
> + a5xx_gpu->preempt[i]->data = 0;
> + a5xx_gpu->preempt[i]->info = 0;

I don't see this bit in the downstream driver. Just curious, do we need
to clear both fields to avoid the gpu faults?

-Akhil
>   a5xx_gpu->preempt[i]->wptr = 0;
>   a5xx_gpu->preempt[i]->rptr = 0;
>   a5xx_gpu->preempt[i]->rbase = gpu->rb[i]->iova;
> -- 
> 2.45.2
> 


Re: [PATCH 1/4] drm/msm/a5xx: disable preemption in submits by default

2024-08-01 Thread Akhil P Oommen
On Mon, Jul 15, 2024 at 02:00:10PM -0700, Rob Clark wrote:
> On Thu, Jul 11, 2024 at 3:02 AM Vladimir Lypak  
> wrote:
> >
> > Fine grain preemption (switching from/to points within submits)
> > requires extra handling in command stream of those submits, especially
> > when rendering with tiling (using GMEM). However this handling is
> > missing at this point in mesa (and always was). For this reason we get
> > random GPU faults and hangs if more than one priority level is used
> > because local preemption is enabled prior to executing command stream
> > from submit.
> > With that said it was ahead of time to enable local preemption by
> > default considering the fact that even on downstream kernel it is only
> > enabled if requested via UAPI.
> >
> > Fixes: a7a4c19c36de ("drm/msm/a5xx: fix setting of the 
> > CP_PREEMPT_ENABLE_LOCAL register")
> > Signed-off-by: Vladimir Lypak 
> > ---
> >  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > index c0b5373e90d7..6c80d3003966 100644
> > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > @@ -150,9 +150,13 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> > OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> > OUT_RING(ring, 1);
> >
> > -   /* Enable local preemption for finegrain preemption */
> > +   /*
> > +* Disable local preemption by default because it requires
> > +* user-space to be aware of it and provide additional handling
> > +* to restore rendering state or do various flushes on switch.
> > +*/
> > OUT_PKT7(ring, CP_PREEMPT_ENABLE_LOCAL, 1);
> > -   OUT_RING(ring, 0x1);
> > +   OUT_RING(ring, 0x0);
> 
> From a quick look at the a530 pfp fw, it looks like
> CP_PREEMPT_ENABLE_LOCAL is allowed in IB1/IB2 (ie. not restricted to
> kernel RB).  So we should just disable it in the kernel, and let
> userspace send a CP_PREEMPT_ENABLE_LOCAL to enable local preemption.

Ack. AFAIU about a5x preemption, this should work.

-Akhil

> 
> BR,
> -R
> 
> > /* Allow CP_CONTEXT_SWITCH_YIELD packets in the IB2 */
> > OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
> > --
> > 2.45.2
> >


Re: [PATCH v5 1/5] drm/msm/adreno: Implement SMEM-based speed bin

2024-07-29 Thread Akhil P Oommen
On Mon, Jul 29, 2024 at 02:40:30PM +0200, Konrad Dybcio wrote:
> 
> 
> On 29.07.2024 2:13 PM, Konrad Dybcio wrote:
> > On 16.07.2024 1:56 PM, Konrad Dybcio wrote:
> >> On 15.07.2024 10:04 PM, Akhil P Oommen wrote:
> >>> On Tue, Jul 09, 2024 at 12:45:29PM +0200, Konrad Dybcio wrote:
> >>>> On recent (SM8550+) Snapdragon platforms, the GPU speed bin data is
> >>>> abstracted through SMEM, instead of being directly available in a fuse.
> >>>>
> >>>> Add support for SMEM-based speed binning, which includes getting
> >>>> "feature code" and "product code" from said source and parsing them
> >>>> to form something that lets us match OPPs against.
> >>>>
> >>>> Due to the product code being ignored in the context of Adreno on
> >>>> production parts (as of SM8650), hardcode it to SOCINFO_PC_UNKNOWN.
> >>>>
> >>>> Signed-off-by: Konrad Dybcio 
> >>>> ---
> >> [...]
> >>
> >>>>  
> >>>> -if (adreno_read_speedbin(dev, &speedbin) || !speedbin)
> >>>> +if (adreno_read_speedbin(adreno_gpu, dev, &speedbin) || 
> >>>> !speedbin)
> >>>>  speedbin = 0x;
> >>>> -adreno_gpu->speedbin = (uint16_t) (0x & speedbin);
> >>>> +adreno_gpu->speedbin = speedbin;
> >>> There are some chipsets which uses both Speedbin and Socinfo data for
> >>> SKU detection [1].
> >> 0_0
> >>
> >>
> >>> We don't need to worry about that logic for now. But
> >>> I am worried about mixing Speedbin and SKU_ID in the UABI with this patch.
> >>> It will be difficult when we have to expose both to userspace.
> >>>
> >>> I think we can use a separate bitfield to expose FCODE/PCODE. Currently,
> >>> the lower 32 bit is reserved for chipid and 33-48 is reserved for 
> >>> speedbin,
> >>> so I think we can use the rest of the 16 bits for SKU_ID. And within that
> >>> 16bits, 12 bits should be sufficient for FCODE and the rest 8 bits
> >>> reserved for future PCODE.
> >> Right, sounds reasonable. Hopefully nothing overflows..
> > +CC Elliot
> > 
> > Would you know whether these sizes ^ are going to be sufficient for
> > the foreseeable future?
> 
> Also Akhil, 12 + 8 > 16.. did you mean 8 bits for both P and FCODE? Or
> 12 for FCODE and 4 for PCODE?

Sorry, "8 bits" was a typo. You are right, 12 bits for Fcode and 4 bits for 
PCODE.

-Akhil

> 
> Konrad


Re: [PATCH] drm/msm/adreno: Fix error return if missing firmware-name

2024-07-16 Thread Akhil P Oommen
On Tue, Jul 16, 2024 at 09:06:30AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> -ENODEV is used to signify that there is no zap shader for the platform,
> and the CPU can directly take the GPU out of secure mode.  We want to
> use this return code when there is no zap-shader node.  But not when
> there is, but without a firmware-name property.  This case we want to
> treat as-if the needed fw is not found.
> 
> Signed-off-by: Rob Clark 
> ---

Reviewed-by: Akhil P Oommen 

-Akhil

>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index b46e7e93b3ed..0d84be3be0b7 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -99,7 +99,7 @@ static int zap_shader_load_mdt(struct msm_gpu *gpu, const 
> char *fwname,
>* was a bad idea, and is only provided for backwards
>* compatibility for older targets.
>*/
> - return -ENODEV;
> + return -ENOENT;
>   }
>  
>   if (IS_ERR(fw)) {
> -- 
> 2.45.2
> 


Re: [PATCH v5 1/5] drm/msm/adreno: Implement SMEM-based speed bin

2024-07-15 Thread Akhil P Oommen
On Tue, Jul 09, 2024 at 12:45:29PM +0200, Konrad Dybcio wrote:
> On recent (SM8550+) Snapdragon platforms, the GPU speed bin data is
> abstracted through SMEM, instead of being directly available in a fuse.
> 
> Add support for SMEM-based speed binning, which includes getting
> "feature code" and "product code" from said source and parsing them
> to form something that lets us match OPPs against.
> 
> Due to the product code being ignored in the context of Adreno on
> production parts (as of SM8650), hardcode it to SOCINFO_PC_UNKNOWN.
> 
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 14 +-
>  drivers/gpu/drm/msm/adreno/adreno_device.c |  2 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c| 42 
> +++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  7 -
>  4 files changed, 54 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index bcaec86ac67a..0d8682c28ba4 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2117,18 +2117,20 @@ static u32 fuse_to_supp_hw(const struct adreno_info 
> *info, u32 fuse)
>   return UINT_MAX;
>  }
>  
> -static int a6xx_set_supported_hw(struct device *dev, const struct 
> adreno_info *info)
> +static int a6xx_set_supported_hw(struct adreno_gpu *adreno_gpu,
> +  struct device *dev,
> +  const struct adreno_info *info)
>  {
>   u32 supp_hw;
>   u32 speedbin;
>   int ret;
>  
> - ret = adreno_read_speedbin(dev, &speedbin);
> + ret = adreno_read_speedbin(adreno_gpu, dev, &speedbin);
>   /*
> -  * -ENOENT means that the platform doesn't support speedbin which is
> -  * fine
> +  * -ENOENT/EOPNOTSUPP means that the platform doesn't support speedbin
> +  * which is fine
>*/
> - if (ret == -ENOENT) {
> + if (ret == -ENOENT || ret == -EOPNOTSUPP) {
>   return 0;
>   } else if (ret) {
>   dev_err_probe(dev, ret,
> @@ -2283,7 +2285,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  
>   a6xx_llc_slices_init(pdev, a6xx_gpu, is_a7xx);
>  
> - ret = a6xx_set_supported_hw(&pdev->dev, config->info);
> + ret = a6xx_set_supported_hw(adreno_gpu, &pdev->dev, config->info);
>   if (ret) {
>   a6xx_llc_slices_destroy(a6xx_gpu);
>   kfree(a6xx_gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index cfc74a9e2646..0842ea76e616 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -6,6 +6,8 @@
>   * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
>   */
>  
> +#include 
> +
>  #include "adreno_gpu.h"
>  
>  bool hang_debug = false;
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1c6626747b98..cf6652c4439d 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -21,6 +21,9 @@
>  #include "msm_gem.h"
>  #include "msm_mmu.h"
>  
> +#include 
> +#include 
> +
>  static u64 address_space_size = 0;
>  MODULE_PARM_DESC(address_space_size, "Override for size of processes private 
> GPU address space");
>  module_param(address_space_size, ullong, 0600);
> @@ -1061,9 +1064,40 @@ void adreno_gpu_ocmem_cleanup(struct adreno_ocmem 
> *adreno_ocmem)
>  adreno_ocmem->hdl);
>  }
>  
> -int adreno_read_speedbin(struct device *dev, u32 *speedbin)
> +int adreno_read_speedbin(struct adreno_gpu *adreno_gpu,
> +  struct device *dev, u32 *fuse)
>  {
> - return nvmem_cell_read_variable_le_u32(dev, "speed_bin", speedbin);
> + int ret;
> +
> + /*
> +  * Try reading the speedbin via a nvmem cell first
> +  * -ENOENT means "no nvmem-cells" and essentially means "old DT" or
> +  * "nvmem fuse is irrelevant", simply assume it's fine.
> +  */
> + ret = nvmem_cell_read_variable_le_u32(dev, "speed_bin", fuse);
> + if (!ret)
> + return 0;
> + else if (ret != -ENOENT)
> + return dev_err_probe(dev, ret, "Couldn't read the speed bin 
> fuse value\n");
> +
> +#ifdef CONFIG_QCOM_SMEM
> + u32 fcode;
> +
> + /*
> +  * Only check the feature code - the product code only matters for
> +  * proto SoCs unavailable outside Qualcomm labs, as far as GPU bin
> +  * matching is concerned.
> +  *
> +  * Ignore EOPNOTSUPP, as not all SoCs expose this info through SMEM.
> +  */
> + ret = qcom_smem_get_feature_code(&fcode);
> + if (!ret)
> + *fuse = ADRENO_SKU_ID(fcode);
> + else if (ret != -EOPNOTSUPP)
> + return dev_err_probe(dev, ret, "Couldn't get feature code from 
> SMEM\n");
> +#endif
> +
> + return ret;
>  }
>  
>  int adreno

Re: [PATCH v2 3/5] drm/msm/adreno: Introduce gmu_chipid for a740 & a750

2024-06-30 Thread Akhil P Oommen
On Sat, Jun 29, 2024 at 03:06:22PM +0200, Konrad Dybcio wrote:
> On 29.06.2024 3:49 AM, Akhil P Oommen wrote:
> > To simplify, introduce the new gmu_chipid for a740 & a750 GPUs.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> 
> This gets rid of getting patchid from dts, but I suppose that's fine,
> as we can just add a new entry to the id table
> 
> [...]
> 
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -771,7 +771,7 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > unsigned int state)
> > struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> > const struct a6xx_info *a6xx_info = adreno_gpu->info->a6xx;
> > u32 fence_range_lower, fence_range_upper;
> > -   u32 chipid, chipid_min = 0;
> > +   u32 chipid = 0;
> 
> The initialization doesn't seem necessary

Rob, would it be possible to fix this up when you pick this patch?

-Akhil.

> 
> otherwise:
> 
> Reviewed-by: Konrad Dybcio 
> 
> Konrad


Re: [PATCH v4 4/5] drm/msm/adreno: Redo the speedbin assignment

2024-06-30 Thread Akhil P Oommen
On Tue, Jun 25, 2024 at 08:28:09PM +0200, Konrad Dybcio wrote:
> There is no need to reinvent the wheel for simple read-match-set logic.
> 
> Make speedbin discovery and assignment generation independent.
> 
> This implicitly removes the bogus 0x80 / BIT(7) speed bin on A5xx,
> which has no representation in hardware whatshowever.
> 
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 34 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 56 
> -
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 51 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 --
>  4 files changed, 45 insertions(+), 99 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c003f970189b..eed6a2eb1731 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1704,38 +1704,6 @@ static const struct adreno_gpu_funcs funcs = {
>   .get_timestamp = a5xx_get_timestamp,
>  };
>  
> -static void check_speed_bin(struct device *dev)
> -{
> - struct nvmem_cell *cell;
> - u32 val;
> -
> - /*
> -  * If the OPP table specifies a opp-supported-hw property then we have
> -  * to set something with dev_pm_opp_set_supported_hw() or the table
> -  * doesn't get populated so pick an arbitrary value that should
> -  * ensure the default frequencies are selected but not conflict with any
> -  * actual bins
> -  */
> - val = 0x80;
> -
> - cell = nvmem_cell_get(dev, "speed_bin");
> -
> - if (!IS_ERR(cell)) {
> - void *buf = nvmem_cell_read(cell, NULL);
> -
> - if (!IS_ERR(buf)) {
> - u8 bin = *((u8 *) buf);
> -
> - val = (1 << bin);
> - kfree(buf);
> - }
> -
> - nvmem_cell_put(cell);
> - }
> -
> - devm_pm_opp_set_supported_hw(dev, &val, 1);
> -}
> -
>  struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  {
>   struct msm_drm_private *priv = dev->dev_private;
> @@ -1763,8 +1731,6 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
>  
>   a5xx_gpu->lm_leakage = 0x4E001A;
>  
> - check_speed_bin(&pdev->dev);
> -
>   nr_rings = 4;
>  
>   if (config->info->revn == 510)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 8ace096bb68c..f038e5f1fe59 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2112,55 +2112,6 @@ static bool a6xx_progress(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>   return progress;
>  }
>  
> -static u32 fuse_to_supp_hw(const struct adreno_info *info, u32 fuse)
> -{
> - if (!info->speedbins)
> - return UINT_MAX;
> -
> - for (int i = 0; info->speedbins[i].fuse != SHRT_MAX; i++)
> - if (info->speedbins[i].fuse == fuse)
> - return BIT(info->speedbins[i].speedbin);
> -
> - return UINT_MAX;
> -}
> -
> -static int a6xx_set_supported_hw(struct adreno_gpu *adreno_gpu,
> -  struct device *dev,
> -  const struct adreno_info *info)
> -{
> - u32 supp_hw;
> - u32 speedbin;
> - int ret;
> -
> - ret = adreno_read_speedbin(adreno_gpu, dev, &speedbin);
> - /*
> -  * -ENOENT means that the platform doesn't support speedbin which is
> -  * fine
> -  */
> - if (ret == -ENOENT) {
> - return 0;
> - } else if (ret) {
> - dev_err_probe(dev, ret,
> -   "failed to read speed-bin. Some OPPs may not be 
> supported by hardware\n");
> - return ret;
> - }
> -
> - supp_hw = fuse_to_supp_hw(info, speedbin);
> -
> - if (supp_hw == UINT_MAX) {
> - DRM_DEV_ERROR(dev,
> - "missing support for speed-bin: %u. Some OPPs may not 
> be supported by hardware\n",
> - speedbin);
> - supp_hw = BIT(0); /* Default */
> - }
> -
> - ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
> - if (ret)
> - return ret;
> -
> - return 0;
> -}
> -
>  static const struct adreno_gpu_funcs funcs = {
>   .base = {
>   .get_param = adreno_get_param,
> @@ -2292,13 +2243,6 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  
>   a6xx_llc_slices_init(pdev, a6xx_gpu, is_a7xx);
>  
> - ret = a6xx_set_supported_hw(adreno_gpu, &pdev->dev, config->info);
> - if (ret) {
> - a6xx_llc_slices_destroy(a6xx_gpu);
> - kfree(a6xx_gpu);
> - return ERR_PTR(ret);
> - }
> -
>   if (is_a7xx)
>   ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
>   else if (adreno_has_gmu_wrapper(adreno_gpu))
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_g

Re: [PATCH v4 1/5] drm/msm/adreno: Implement SMEM-based speed bin

2024-06-30 Thread Akhil P Oommen
On Tue, Jun 25, 2024 at 08:28:06PM +0200, Konrad Dybcio wrote:
> On recent (SM8550+) Snapdragon platforms, the GPU speed bin data is
> abstracted through SMEM, instead of being directly available in a fuse.
> 
> Add support for SMEM-based speed binning, which includes getting
> "feature code" and "product code" from said source and parsing them
> to form something that lets us match OPPs against.
> 
> Due to the product code being ignored in the context of Adreno on
> production parts (as of SM8650), hardcode it to SOCINFO_PC_UNKNOWN.
> 
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  8 +++---
>  drivers/gpu/drm/msm/adreno/adreno_device.c |  2 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c| 41 
> +++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  7 -
>  4 files changed, 50 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c98cdb1e9326..8ace096bb68c 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2124,13 +2124,15 @@ static u32 fuse_to_supp_hw(const struct adreno_info 
> *info, u32 fuse)
>   return UINT_MAX;
>  }
>  
> -static int a6xx_set_supported_hw(struct device *dev, const struct 
> adreno_info *info)
> +static int a6xx_set_supported_hw(struct adreno_gpu *adreno_gpu,
> +  struct device *dev,
> +  const struct adreno_info *info)
>  {
>   u32 supp_hw;
>   u32 speedbin;
>   int ret;
>  
> - ret = adreno_read_speedbin(dev, &speedbin);
> + ret = adreno_read_speedbin(adreno_gpu, dev, &speedbin);
>   /*
>* -ENOENT means that the platform doesn't support speedbin which is
>* fine
> @@ -2290,7 +2292,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  
>   a6xx_llc_slices_init(pdev, a6xx_gpu, is_a7xx);
>  
> - ret = a6xx_set_supported_hw(&pdev->dev, config->info);
> + ret = a6xx_set_supported_hw(adreno_gpu, &pdev->dev, config->info);
>   if (ret) {
>   a6xx_llc_slices_destroy(a6xx_gpu);
>   kfree(a6xx_gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 1e789ff6945e..e514346088f9 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -6,6 +6,8 @@
>   * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
>   */
>  
> +#include 
> +
>  #include "adreno_gpu.h"
>  
>  bool hang_debug = false;
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1c6626747b98..6ffd02f38499 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -21,6 +21,9 @@
>  #include "msm_gem.h"
>  #include "msm_mmu.h"
>  
> +#include 
> +#include 
> +
>  static u64 address_space_size = 0;
>  MODULE_PARM_DESC(address_space_size, "Override for size of processes private 
> GPU address space");
>  module_param(address_space_size, ullong, 0600);
> @@ -1061,9 +1064,39 @@ void adreno_gpu_ocmem_cleanup(struct adreno_ocmem 
> *adreno_ocmem)
>  adreno_ocmem->hdl);
>  }
>  
> -int adreno_read_speedbin(struct device *dev, u32 *speedbin)
> +int adreno_read_speedbin(struct adreno_gpu *adreno_gpu,
> +  struct device *dev, u32 *fuse)
>  {
> - return nvmem_cell_read_variable_le_u32(dev, "speed_bin", speedbin);
> + u32 fcode;
> + int ret;
> +
> + /*
> +  * Try reading the speedbin via a nvmem cell first
> +  * -ENOENT means "no nvmem-cells" and essentially means "old DT" or
> +  * "nvmem fuse is irrelevant", simply assume it's fine.
> +  */
> + ret = nvmem_cell_read_variable_le_u32(dev, "speed_bin", fuse);
> + if (!ret)
> + return 0;
> + else if (ret != -ENOENT)
> + return dev_err_probe(dev, ret, "Couldn't read the speed bin 
> fuse value\n");
> +
> +#ifdef CONFIG_QCOM_SMEM
> + /*
> +  * Only check the feature code - the product code only matters for
> +  * proto SoCs unavailable outside Qualcomm labs, as far as GPU bin
> +  * matching is concerned.
> +  *
> +  * Ignore EOPNOTSUPP, as not all SoCs expose this info through SMEM.
> +  */
> + ret = qcom_smem_get_feature_code(&fcode);
> + if (!ret)
> + *fuse = ADRENO_SKU_ID(fcode);
> + else if (ret != -EOPNOTSUPP)
> + return dev_err_probe(dev, ret, "Couldn't get feature code from 
> SMEM\n");
> +#endif
> +
> + return 0;
>  }
>  
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> @@ -1102,9 +1135,9 @@ int adreno_gpu_init(struct drm_device *drm, struct 
> platform_device *pdev,
>   devm_pm_opp_set_clkname(dev, "core");
>   }
>  
> - if (adreno_read_speedbin(dev, &speedbin) || !speedbi

Re: [PATCH v4 1/5] drm/msm/adreno: Split up giant device table

2024-06-30 Thread Akhil P Oommen
On Sat, Jun 29, 2024 at 06:32:05AM -0700, Rob Clark wrote:
> On Fri, Jun 28, 2024 at 6:58 PM Akhil P Oommen  
> wrote:
> >
> > On Tue, Jun 18, 2024 at 09:42:47AM -0700, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > Split into a separate table per generation, in preparation to move each
> > > gen's device table to it's own file.
> > >
> > > Signed-off-by: Rob Clark 
> > > Reviewed-by: Dmitry Baryshkov 
> > > Reviewed-by: Konrad Dybcio 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/adreno_device.c | 67 +-
> > >  drivers/gpu/drm/msm/adreno/adreno_gpu.h| 10 
> > >  2 files changed, 63 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > > b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > index c3703a51287b..a57659eaddc2 100644
> > > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > @@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
> > >  MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in 
> > > place of IOMMU");
> > >  module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
> > >
> > > -static const struct adreno_info gpulist[] = {
> > > +static const struct adreno_info a2xx_gpus[] = {
> > >   {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x0200),
> > >   .family = ADRENO_2XX_GEN1,
> > > @@ -54,7 +54,12 @@ static const struct adreno_info gpulist[] = {
> > >   .gmem  = SZ_512K,
> > >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> > >   .init  = a2xx_gpu_init,
> > > - }, {
> > > + }
> > > +};
> > > +DECLARE_ADRENO_GPULIST(a2xx);
> > > +
> > > +static const struct adreno_info a3xx_gpus[] = {
> > > + {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
> > >   .family = ADRENO_3XX,
> > >   .fw = {
> > > @@ -116,7 +121,12 @@ static const struct adreno_info gpulist[] = {
> > >   .gmem  = SZ_1M,
> > >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> > >   .init  = a3xx_gpu_init,
> > > - }, {
> > > + }
> > > +};
> > > +DECLARE_ADRENO_GPULIST(a3xx);
> > > +
> > > +static const struct adreno_info a4xx_gpus[] = {
> > > + {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x04000500),
> > >   .family = ADRENO_4XX,
> > >   .revn  = 405,
> > > @@ -149,7 +159,12 @@ static const struct adreno_info gpulist[] = {
> > >   .gmem  = (SZ_1M + SZ_512K),
> > >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> > >   .init  = a4xx_gpu_init,
> > > - }, {
> > > + }
> > > +};
> > > +DECLARE_ADRENO_GPULIST(a4xx);
> > > +
> > > +static const struct adreno_info a5xx_gpus[] = {
> > > + {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x05000600),
> > >   .family = ADRENO_5XX,
> > >   .revn = 506,
> > > @@ -274,7 +289,12 @@ static const struct adreno_info gpulist[] = {
> > >   .quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
> > >   .init = a5xx_gpu_init,
> > >   .zapfw = "a540_zap.mdt",
> > > - }, {
> > > + }
> > > +};
> > > +DECLARE_ADRENO_GPULIST(a5xx);
> > > +
> > > +static const struct adreno_info a6xx_gpus[] = {
> > > + {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x0601),
> > >   .family = ADRENO_6XX_GEN1,
> > >   .revn = 610,
> > > @@ -520,7 +540,12 @@ static const struct adreno_info gpulist[] = {
> > >   .zapfw = "a690_zap.mdt",
> > >   .hwcg = a690_hwcg,
> > >   .address_space_size = SZ_16G,
> > > - }, {
> > > + }
> > > +};
> > > +DECLARE_ADRENO_GPULIST(a6xx);
> > > +
> > > +static const struct adreno_info a7xx_gpus[] = {
> > > + {
> > >   .chip_ids = ADRENO_CHIP_IDS(0x07000200),
> > >   .family = ADRENO_6XX_GEN1, /* NOT a mistake! */
> > >   .fw = {
> > > @@ -582,7 +607,17 @@ static const struct adreno_info gpulist[] = {
&g

Re: [PATCH v2 0/5] Support for Adreno X1-85 GPU

2024-06-28 Thread Akhil P Oommen
On Sat, Jun 29, 2024 at 07:19:33AM +0530, Akhil P Oommen wrote:
> This series adds support for the Adreno X1-85 GPU found in Qualcomm's
> compute series chipset, Snapdragon X1 Elite (x1e80100). In this new
> naming scheme for Adreno GPU, 'X' stands for compute series, '1' denotes
> 1st generation and '8' & '5' denotes the tier and the SKU which it
> belongs.
> 
> X1-85 has major focus on doubling core clock frequency and bandwidth
> throughput. It has a dedicated collapsible Graphics MX rail (gmxc) to
> power the memories and double the number of data channels to improve
> bandwidth to DDR.
> 
> Mesa has the necessary bits present already to support this GPU. We are
> able to bring up Gnome desktop by hardcoding "0x43050a01" as
> chipid. Also, verified glxgears and glmark2. We have plans to add the
> new chipid support to Mesa in next few weeks, but these patches can go in
> right away to get included in v6.11.
> 
> This series is rebased on top of msm-next branch. P3 cherry-picks cleanly on
> qcom/for-next.

A typo here: P5 cherry-picks cleanly on qcom/for-next.

-Akhil
> 
> P1, P2 & P3 for Rob Clark
> P4 for Will Deacon
> P5 for Bjorn to pick up.
> 
> Changes in v2:
> - Minor update to compatible pattern, '[x]' -> 'x'
> - Increased address space size (Rob)
> - Introduced gmu_chipid in a6xx_info (Rob)
> - Improved fallback logic for gmxc (Dmitry)
> - Rebased on top of msm-next
> - Picked a new patch for arm-mmu bindings update
> - Reordered gpu & gmu reg enties to match schema
> 
> Akhil P Oommen (5):
>   dt-bindings: display/msm/gmu: Add Adreno X185 GMU
>   drm/msm/adreno: Add support for X185 GPU
>   drm/msm/adreno: Introduce gmu_chipid for a740 & a750
>   dt-bindings: arm-smmu: Add X1E80100 GPU SMMU
>   arm64: dts: qcom: x1e80100: Add gpu support
> 
>  .../devicetree/bindings/display/msm/gmu.yaml  |   4 +
>  .../devicetree/bindings/iommu/arm,smmu.yaml   |   3 +-
>  arch/arm64/boot/dts/qcom/x1e80100.dtsi| 195 ++
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c |  20 ++
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  34 +--
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |   2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
>  8 files changed, 239 insertions(+), 25 deletions(-)
> 
> -- 
> 2.45.1
> 


Re: [PATCH v4 1/5] drm/msm/adreno: Split up giant device table

2024-06-28 Thread Akhil P Oommen
On Tue, Jun 18, 2024 at 09:42:47AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Split into a separate table per generation, in preparation to move each
> gen's device table to it's own file.
> 
> Signed-off-by: Rob Clark 
> Reviewed-by: Dmitry Baryshkov 
> Reviewed-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/adreno_device.c | 67 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h| 10 
>  2 files changed, 63 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index c3703a51287b..a57659eaddc2 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
>  MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place 
> of IOMMU");
>  module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
>  
> -static const struct adreno_info gpulist[] = {
> +static const struct adreno_info a2xx_gpus[] = {
>   {
>   .chip_ids = ADRENO_CHIP_IDS(0x0200),
>   .family = ADRENO_2XX_GEN1,
> @@ -54,7 +54,12 @@ static const struct adreno_info gpulist[] = {
>   .gmem  = SZ_512K,
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
>   .init  = a2xx_gpu_init,
> - }, {
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a2xx);
> +
> +static const struct adreno_info a3xx_gpus[] = {
> + {
>   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
>   .family = ADRENO_3XX,
>   .fw = {
> @@ -116,7 +121,12 @@ static const struct adreno_info gpulist[] = {
>   .gmem  = SZ_1M,
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
>   .init  = a3xx_gpu_init,
> - }, {
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a3xx);
> +
> +static const struct adreno_info a4xx_gpus[] = {
> + {
>   .chip_ids = ADRENO_CHIP_IDS(0x04000500),
>   .family = ADRENO_4XX,
>   .revn  = 405,
> @@ -149,7 +159,12 @@ static const struct adreno_info gpulist[] = {
>   .gmem  = (SZ_1M + SZ_512K),
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
>   .init  = a4xx_gpu_init,
> - }, {
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a4xx);
> +
> +static const struct adreno_info a5xx_gpus[] = {
> + {
>   .chip_ids = ADRENO_CHIP_IDS(0x05000600),
>   .family = ADRENO_5XX,
>   .revn = 506,
> @@ -274,7 +289,12 @@ static const struct adreno_info gpulist[] = {
>   .quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
>   .init = a5xx_gpu_init,
>   .zapfw = "a540_zap.mdt",
> - }, {
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a5xx);
> +
> +static const struct adreno_info a6xx_gpus[] = {
> + {
>   .chip_ids = ADRENO_CHIP_IDS(0x0601),
>   .family = ADRENO_6XX_GEN1,
>   .revn = 610,
> @@ -520,7 +540,12 @@ static const struct adreno_info gpulist[] = {
>   .zapfw = "a690_zap.mdt",
>   .hwcg = a690_hwcg,
>   .address_space_size = SZ_16G,
> - }, {
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a6xx);
> +
> +static const struct adreno_info a7xx_gpus[] = {
> + {
>   .chip_ids = ADRENO_CHIP_IDS(0x07000200),
>   .family = ADRENO_6XX_GEN1, /* NOT a mistake! */
>   .fw = {
> @@ -582,7 +607,17 @@ static const struct adreno_info gpulist[] = {
>   .init = a6xx_gpu_init,
>   .zapfw = "gen70900_zap.mbn",
>   .address_space_size = SZ_16G,
> - },
> + }
> +};
> +DECLARE_ADRENO_GPULIST(a7xx);
> +
> +static const struct adreno_gpulist *gpulists[] = {
> + &a2xx_gpulist,
> + &a3xx_gpulist,
> + &a4xx_gpulist,
> + &a5xx_gpulist,
> + &a6xx_gpulist,
> + &a6xx_gpulist,

Typo. a6xx_gpulist -> a7xx_gpulist.

-Akhil

>  };
>  
>  MODULE_FIRMWARE("qcom/a300_pm4.fw");
> @@ -617,13 +652,17 @@ MODULE_FIRMWARE("qcom/yamato_pm4.fw");
>  static const struct adreno_info *adreno_info(uint32_t chip_id)
>  {
>   /* identify gpu: */
> - for (int i = 0; i < ARRAY_SIZE(gpulist); i++) {
> - const struct adreno_info *info = &gpulist[i];
> - if (info->machine && !of_machine_is_compatible(info->machine))
> - continue;
> - for (int j = 0; info->chip_ids[j]; j++)
> - if (info->chip_ids[j] == chip_id)
> - return info;
> + for (int i = 0; i < ARRAY_SIZE(gpulists); i++) {
> + for (int j = 0; j < gpulists[i]->gpus_count; j++) {
> + const struct adreno_info *info = &gpulists[i]->gpus[j];
> +
> + if (info->machine && 
> !of_machine_is_compatible(info->machine))
> + continue;
> +
> + for (int k = 0; info->chip_ids[k]; k++)
> +

[PATCH v2 5/5] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-28 Thread Akhil P Oommen
Add the necessary dt nodes for gpu support in X1E80100.

Signed-off-by: Akhil P Oommen 
---

Changes in v2:
- Reordered gpu & gmu reg enties to match schema

 arch/arm64/boot/dts/qcom/x1e80100.dtsi | 195 +
 1 file changed, 195 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi 
b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index 5f90a0b3c016..f043204aa12f 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2985,6 +2986,200 @@ tcsr: clock-controller@1fc {
#reset-cells = <1>;
};
 
+   gpu: gpu@3d0 {
+   compatible = "qcom,adreno-43050c01", "qcom,adreno";
+   reg = <0x0 0x03d0 0x0 0x4>,
+ <0x0 0x03d9e000 0x0 0x1000>,
+ <0x0 0x03d61000 0x0 0x800>;
+
+   reg-names = "kgsl_3d0_reg_memory",
+   "cx_mem",
+   "cx_dbgc";
+
+   interrupts = ;
+
+   iommus = <&adreno_smmu 0 0x0>,
+<&adreno_smmu 1 0x0>;
+
+   operating-points-v2 = <&gpu_opp_table>;
+
+   qcom,gmu = <&gmu>;
+   #cooling-cells = <2>;
+
+   interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt 
SLAVE_EBI1 0>;
+   interconnect-names = "gfx-mem";
+
+   zap-shader {
+   memory-region = <&gpu_microcode_mem>;
+   firmware-name = "qcom/gen70500_zap.mbn";
+   };
+
+   gpu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-11 {
+   opp-hz = /bits/ 64 <11>;
+   opp-level = 
;
+   opp-peak-kBps = <1650>;
+   };
+
+   opp-10 {
+   opp-hz = /bits/ 64 <10>;
+   opp-level = 
;
+   opp-peak-kBps = <14398438>;
+   };
+
+   opp-92500 {
+   opp-hz = /bits/ 64 <92500>;
+   opp-level = 
;
+   opp-peak-kBps = <14398438>;
+   };
+
+   opp-8 {
+   opp-hz = /bits/ 64 <8>;
+   opp-level = ;
+   opp-peak-kBps = <12449219>;
+   };
+
+   opp-74400 {
+   opp-hz = /bits/ 64 <74400>;
+   opp-level = 
;
+   opp-peak-kBps = <10687500>;
+   };
+
+   opp-68700 {
+   opp-hz = /bits/ 64 <68700>;
+   opp-level = 
;
+   opp-peak-kBps = <8171875>;
+   };
+
+   opp-55000 {
+   opp-hz = /bits/ 64 <55000>;
+   opp-level = ;
+   opp-peak-kBps = <6074219>;
+   };
+
+   opp-39000 {
+   opp-hz = /bits/ 64 <39000>;
+   opp-level = 
;
+   opp-peak-kBps = <300>;
+   };
+
+   opp-3 {
+   opp-hz = /bits/ 64 <3>;
+   opp-level = 
;
+   opp-peak-kBps = <2136719>;
+   };
+   };
+   };
+
+   gmu: gmu@3d6a000 {
+   compatible = "qcom,adreno-gmu-x185.1", 
"qcom,adreno-gmu";
+   reg = <0x0 0x03d6a000 0x0 0x35000>,
+   

[PATCH v2 4/5] dt-bindings: arm-smmu: Add X1E80100 GPU SMMU

2024-06-28 Thread Akhil P Oommen
Update the devicetree bindings to support the gpu present in
X1E80100 platform.

Signed-off-by: Akhil P Oommen 
---

Changes in v2:
- New patch in v2

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 5c130cf06a21..7ef225d4d783 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -95,6 +95,7 @@ properties:
   - qcom,sm8450-smmu-500
   - qcom,sm8550-smmu-500
   - qcom,sm8650-smmu-500
+  - qcom,x1e80100-smmu-500
   - const: qcom,adreno-smmu
   - const: qcom,smmu-500
   - const: arm,mmu-500
@@ -520,6 +521,7 @@ allOf:
 - enum:
 - qcom,sm8550-smmu-500
 - qcom,sm8650-smmu-500
+- qcom,x1e80100-smmu-500
 - const: qcom,adreno-smmu
 - const: qcom,smmu-500
 - const: arm,mmu-500
@@ -557,7 +559,6 @@ allOf:
   - qcom,sdx65-smmu-500
   - qcom,sm6350-smmu-500
   - qcom,sm6375-smmu-500
-  - qcom,x1e80100-smmu-500
 then:
   properties:
 clock-names: false
-- 
2.45.1



[PATCH v2 2/5] drm/msm/adreno: Add support for X185 GPU

2024-06-28 Thread Akhil P Oommen
Add support in drm/msm driver for the Adreno X185 gpu found in
Snapdragon X1 Elite chipset.

Signed-off-by: Akhil P Oommen 
---

Changes in v2:
- Increased address space size (Rob)
- Introduced gmu_chipid in a6xx_info (Rob)
- Improved fallback logic for gmxc (Dmitry)

 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 18 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 +
 5 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 53e33ff78411..c507681648ac 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1208,6 +1208,24 @@ static const struct adreno_info a7xx_gpus[] = {
.protect = &a730_protect,
},
.address_space_size = SZ_16G,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" */
+   .family = ADRENO_7XX_GEN2,
+   .fw = {
+   [ADRENO_FW_SQE] = "gen70500_sqe.fw",
+   [ADRENO_FW_GMU] = "gen70500_gmu.bin",
+   },
+   .gmem = 3 * SZ_1M,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
+ ADRENO_QUIRK_HAS_HW_APRIV,
+   .init = a6xx_gpu_init,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a740_hwcg,
+   .protect = &a730_protect,
+   .gmu_chipid = 0x7050001,
+   },
+   .address_space_size = SZ_256G,
}, {
.chip_ids = ADRENO_CHIP_IDS(0x43051401), /* "C520v2" */
.family = ADRENO_7XX_GEN3,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 0e3dfd4c2bc8..20034aa2fad8 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -769,6 +769,7 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, unsigned 
int state)
 {
struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+   const struct a6xx_info *a6xx_info = adreno_gpu->info->a6xx;
u32 fence_range_lower, fence_range_upper;
u32 chipid, chipid_min = 0;
int ret;
@@ -830,8 +831,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
 */
gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
 
+   if (a6xx_info->gmu_chipid) {
+   chipid = a6xx_info->gmu_chipid;
/* NOTE: A730 may also fall in this if-condition with a future GMU fw 
update. */
-   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
+   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
/* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
 
@@ -1329,7 +1332,13 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device 
*dev, u32 *votes,
if (!pri_count)
return -EINVAL;
 
-   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
+   /*
+* Some targets have a separate gfx mxc rail. So try to read that first 
and then fall back
+* to regular mx rail if it is missing
+*/
+   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
+   if (IS_ERR(sec) && sec != ERR_PTR(-EPROBE_DEFER))
+   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
if (IS_ERR(sec))
return PTR_ERR(sec);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c98cdb1e9326..092e0a1dd612 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1031,7 +1031,7 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
 
/* Set weights for bicubic filtering */
-   if (adreno_is_a650_family(adreno_gpu)) {
+   if (adreno_is_a650_family(adreno_gpu) || adreno_is_x185(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
0x3fe05ff4);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 1c3cc6df70fe..e3e5c53ae8af 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -21,6 +21,7 @@ extern bool hang_debug;
 struct a6xx_info {
const struct adreno_regli

[PATCH v2 3/5] drm/msm/adreno: Introduce gmu_chipid for a740 & a750

2024-06-28 Thread Akhil P Oommen
To simplify, introduce the new gmu_chipid for a740 & a750 GPUs.

Signed-off-by: Akhil P Oommen 
---

Changes in v2:
- New patch in v2

 drivers/gpu/drm/msm/adreno/a6xx_catalog.c |  2 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 23 +--
 2 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index c507681648ac..bdafca7267a8 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1206,6 +1206,7 @@ static const struct adreno_info a7xx_gpus[] = {
.a6xx = &(const struct a6xx_info) {
.hwcg = a740_hwcg,
.protect = &a730_protect,
+   .gmu_chipid = 0x7020100,
},
.address_space_size = SZ_16G,
}, {
@@ -1241,6 +1242,7 @@ static const struct adreno_info a7xx_gpus[] = {
.zapfw = "gen70900_zap.mbn",
.a6xx = &(const struct a6xx_info) {
.protect = &a730_protect,
+   .gmu_chipid = 0x7090100,
},
.address_space_size = SZ_16G,
}
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 20034aa2fad8..e4c430504daa 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -771,7 +771,7 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, unsigned 
int state)
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
const struct a6xx_info *a6xx_info = adreno_gpu->info->a6xx;
u32 fence_range_lower, fence_range_upper;
-   u32 chipid, chipid_min = 0;
+   u32 chipid = 0;
int ret;
 
/* Vote veto for FAL10 */
@@ -833,27 +833,6 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
 
if (a6xx_info->gmu_chipid) {
chipid = a6xx_info->gmu_chipid;
-   /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
update. */
-   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
-   /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
-   chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
-
-   /*
-* The min part has a 1-1 mapping for each GPU SKU.
-* This chipid that the GMU expects corresponds to the 
"GENX_Y_Z" naming,
-* where X = major, Y = minor, Z = patchlevel, e.g. GEN7_2_1 
for prod A740.
-*/
-   if (adreno_is_a740(adreno_gpu))
-   chipid_min = 2;
-   else if (adreno_is_a750(adreno_gpu))
-   chipid_min = 9;
-   else
-   return -EINVAL;
-
-   chipid |= FIELD_PREP(GENMASK(23, 16), chipid_min);
-
-   /* Get the patchid (which may vary) from the device tree */
-   chipid |= FIELD_PREP(GENMASK(15, 8), 
adreno_patchid(adreno_gpu));
} else {
/*
 * Note that the GMU has a slightly different layout for
-- 
2.45.1



[PATCH v2 0/5] Support for Adreno X1-85 GPU

2024-06-28 Thread Akhil P Oommen
This series adds support for the Adreno X1-85 GPU found in Qualcomm's
compute series chipset, Snapdragon X1 Elite (x1e80100). In this new
naming scheme for Adreno GPU, 'X' stands for compute series, '1' denotes
1st generation and '8' & '5' denotes the tier and the SKU which it
belongs.

X1-85 has major focus on doubling core clock frequency and bandwidth
throughput. It has a dedicated collapsible Graphics MX rail (gmxc) to
power the memories and double the number of data channels to improve
bandwidth to DDR.

Mesa has the necessary bits present already to support this GPU. We are
able to bring up Gnome desktop by hardcoding "0x43050a01" as
chipid. Also, verified glxgears and glmark2. We have plans to add the
new chipid support to Mesa in next few weeks, but these patches can go in
right away to get included in v6.11.

This series is rebased on top of msm-next branch. P3 cherry-picks cleanly on
qcom/for-next.

P1, P2 & P3 for Rob Clark
P4 for Will Deacon
P5 for Bjorn to pick up.

Changes in v2:
- Minor update to compatible pattern, '[x]' -> 'x'
- Increased address space size (Rob)
- Introduced gmu_chipid in a6xx_info (Rob)
- Improved fallback logic for gmxc (Dmitry)
- Rebased on top of msm-next
- Picked a new patch for arm-mmu bindings update
- Reordered gpu & gmu reg enties to match schema

Akhil P Oommen (5):
  dt-bindings: display/msm/gmu: Add Adreno X185 GMU
  drm/msm/adreno: Add support for X185 GPU
  drm/msm/adreno: Introduce gmu_chipid for a740 & a750
  dt-bindings: arm-smmu: Add X1E80100 GPU SMMU
  arm64: dts: qcom: x1e80100: Add gpu support

 .../devicetree/bindings/display/msm/gmu.yaml  |   4 +
 .../devicetree/bindings/iommu/arm,smmu.yaml   |   3 +-
 arch/arm64/boot/dts/qcom/x1e80100.dtsi| 195 ++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c |  20 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  34 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
 8 files changed, 239 insertions(+), 25 deletions(-)

-- 
2.45.1



[PATCH v2 1/5] dt-bindings: display/msm/gmu: Add Adreno X185 GMU

2024-06-28 Thread Akhil P Oommen
Document Adreno X185 GMU in the dt-binding specification.

Signed-off-by: Akhil P Oommen 
Reviewed-by: Krzysztof Kozlowski 
---

Changes in v2:
- Minor update to compatible pattern, '[x]' -> 'x'

 Documentation/devicetree/bindings/display/msm/gmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index b3837368a260..b1bd372996d5 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -23,6 +23,9 @@ properties:
   - items:
   - pattern: '^qcom,adreno-gmu-[67][0-9][0-9]\.[0-9]$'
   - const: qcom,adreno-gmu
+  - items:
+  - pattern: '^qcom,adreno-gmu-x[1-9][0-9][0-9]\.[0-9]$'
+  - const: qcom,adreno-gmu
   - const: qcom,adreno-gmu-wrapper
 
   reg:
@@ -225,6 +228,7 @@ allOf:
   - qcom,adreno-gmu-730.1
   - qcom,adreno-gmu-740.1
   - qcom,adreno-gmu-750.1
+  - qcom,adreno-gmu-x185.1
 then:
   properties:
 reg:
-- 
2.45.1



Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-26 Thread Akhil P Oommen
<< snip >>

> > > > > > @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> > > > > > platform_device *pdev,
> > > > > > return ERR_PTR(-EINVAL);
> > > > > > }
> > > > > >
> > > > > > -   ret = ioremap(res->start, resource_size(res));
> > > > > > +   ret = devm_ioremap_resource(&pdev->dev, res);
> > > > >
> > > > > So, this doesn't actually work, failing in __request_region_locked(),
> > > > > because the gmu region partially overlaps with the gpucc region (which
> > > > > is busy).  I think this is intentional, since gmu is controlling the
> > > > > gpu clocks, etc.  In particular REG_A6XX_GPU_CC_GX_GDSCR is in this
> > > > > overlapping region.  Maybe Akhil knows more about GMU.
> > > >
> > > > We don't really need to map gpucc region from driver on behalf of gmu.
> > > > Since we don't access any gpucc register from drm-msm driver, we can
> > > > update the range size to correct this. But due to backward compatibility
> > > > requirement with older dt, can we still enable region locking? I prefer
> > > > it if that is possible.
> > >
> > > Actually, when I reduced the region size to not overlap with gpucc,
> > > the region is smaller than REG_A6XX_GPU_CC_GX_GDSCR * 4.
> > >
> > > So I guess that register is actually part of gpucc?
> >
> > Yes. It has *GPU_CC* in its name. :P
> >
> > I just saw that we program this register on legacy a6xx targets to
> > ensure retention is really ON before collapsing gdsc. So we can't
> > avoid mapping gpucc region in legacy a6xx GPUs. That is unfortunate!
> 
> I guess we could still use devm_ioremap().. idk if there is a better
> way to solve this

Can we do it without breaking backward compatibility with dt?

-Akhil

> 
> BR,
> -R
> 
> > -Akhil.
> >
> > >
> > > BR,
> > > -R


Re: [PATCH v2 1/2] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-06-26 Thread Akhil P Oommen
On Wed, Jun 26, 2024 at 09:59:39AM +0200, Daniel Vetter wrote:
> On Tue, Jun 25, 2024 at 08:54:41PM +0200, Konrad Dybcio wrote:
> > Memory barriers help ensure instruction ordering, NOT time and order
> > of actual write arrival at other observers (e.g. memory-mapped IP).
> > On architectures employing weak memory ordering, the latter can be a
> > giant pain point, and it has been as part of this driver.
> > 
> > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > readl/writel, which include r/w (respectively) barriers.
> > 
> > Replace the barriers with a readback (or drop altogether where possible)
> > that ensures the previous writes have exited the write buffer (as the CPU
> > must flush the write to the register it's trying to read back).
> > 
> > Signed-off-by: Konrad Dybcio 
> 
> Some in pci these readbacks are actually part of the spec and called
> posting reads. I'd very much recommend drivers create a small wrapper
> function for these cases with a void return value, because it makes the
> code so much more legible and easier to understand.

For Adreno which is configured via mmio, we don't need to do this often. 
GBIF_HALT
is a scenario where we need to be extra careful as it can potentially cause some
internal lockup. Another scenario I can think of is GPU soft reset where need to
keep a delay on cpu side after triggering. We should closely scrutinize any
other instance that comes up. So I feel a good justification as a comment here
would be enough, to remind the reader. Think of it as a way to discourage the
use by making it hard.

This is a bit subjective, I am fine if you have a strong opinion on this.

-Akhil.

> -Sima
> 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  4 +---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 10 ++
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..09d640165b18 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -466,9 +466,7 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> > int ret;
> > u32 val;
> >  
> > -   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > -   /* Wait for the register to finish posting */
> > -   wmb();
> > +   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> >  
> > ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> > val & (1 << 1), 100, 1);
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index c98cdb1e9326..4083d0cad782 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -855,14 +855,16 @@ static int hw_init(struct msm_gpu *gpu)
> > /* Clear GBIF halt in case GX domain was not collapsed */
> > if (adreno_is_a619_holi(adreno_gpu)) {
> > gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > +   gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > +
> > gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > -   /* Let's make extra sure that the GPU can access the memory.. */
> > -   mb();
> > +   gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
> > } else if (a6xx_has_gbif(adreno_gpu)) {
> > gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > +   gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > +
> > gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > -   /* Let's make extra sure that the GPU can access the memory.. */
> > -   mb();
> > +   gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
> > }
> >  
> > /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> > 
> > -- 
> > 2.45.2
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-26 Thread Akhil P Oommen
On Mon, Jun 24, 2024 at 03:57:35PM +0200, Konrad Dybcio wrote:
> 
> 
> On 6/23/24 13:06, Akhil P Oommen wrote:
> > Add the necessary dt nodes for gpu support in X1E80100.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> 
> [...]
> 
> > +
> > +   opp-11 {
> > +   opp-hz = /bits/ 64 <11>;
> > +   opp-level = 
> > ;
> > +   opp-peak-kBps = <1650>;
> 
> No speedbins?

Coming soon! I am waiting for some confirmations on some SKU related
data. This is the lowest Fmax among all SKUs which we can safely enable
for now.

-Akhil.
> 
> Konrad


Re: [PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-26 Thread Akhil P Oommen
On Mon, Jun 24, 2024 at 12:23:42AM +0300, Dmitry Baryshkov wrote:
> On Sun, Jun 23, 2024 at 04:36:30PM GMT, Akhil P Oommen wrote:
> > Add the necessary dt nodes for gpu support in X1E80100.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > 
> >  arch/arm64/boot/dts/qcom/x1e80100.dtsi | 195 +
> >  1 file changed, 195 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi 
> > b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
> > index 5f90a0b3c016..3e887286bab4 100644
> > --- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
> > @@ -6,6 +6,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> 
> 
> > +   gmu: gmu@3d6a000 {
> > +   compatible = "qcom,adreno-gmu-x185.1", 
> > "qcom,adreno-gmu";
> > +   reg = <0x0 0x03d5 0x0 0x1>,
> > + <0x0 0x03d6a000 0x0 0x35000>,
> > + <0x0 0x0b28 0x0 0x1>;
> > +   reg-names =  "rscc", "gmu", "gmu_pdc";
> 
> Ther @address should match the first resource defined for a device.

I will reorder this and move gmu to first.

-Akhil.

> 
> > +
> -- 
> With best wishes
> Dmitry


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Akhil P Oommen
On Wed, Jun 26, 2024 at 11:43:08AM -0700, Rob Clark wrote:
> On Wed, Jun 26, 2024 at 1:24 AM Akhil P Oommen  
> wrote:
> >
> > On Mon, Jun 24, 2024 at 03:53:48PM +0200, Konrad Dybcio wrote:
> > >
> > >
> > > On 6/23/24 13:06, Akhil P Oommen wrote:
> > > > Add support in drm/msm driver for the Adreno X185 gpu found in
> > > > Snapdragon X1 Elite chipset.
> > > >
> > > > Signed-off-by: Akhil P Oommen 
> > > > ---
> > > >
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> > > >   drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> > > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> > > >   4 files changed, 36 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > > > unsigned int state)
> > > >  */
> > > > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> > > > +   if (adreno_is_x185(adreno_gpu)) {
> > > > +   chipid = 0x7050001;
> > >
> > > What's wrong with using the logic below?
> >
> > patchid is BITS(7, 0), not (15, 8) in the case of x185. Due to the
> > changes in the chipid scheme within the a7x family, this is a bit
> > confusing. I will try to improve here in another series.
> 
> I'm thinking we should just add gmu_chipid to struct a6xx_info, tbh
> 
> Maybe to start with, we can fall back to the existing logic if
> a6xx_info::gmu_chipid is zero so we don't have to add it for _every_
> a6xx/a7xx

Agree, I was thinking the same.

-Akhil.
> 
> BR,
> -R
> 
> > >
> > > > /* NOTE: A730 may also fall in this if-condition with a future GMU 
> > > > fw update. */
> > > > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > > > +   } else if (adreno_is_a7xx(adreno_gpu) && 
> > > > !adreno_is_a730(adreno_gpu)) {
> > > > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 
> > > > */
> > > > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> > > > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > > > device *dev, u32 *votes,
> > > > if (!pri_count)
> > > > return -EINVAL;
> > > > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > > -   if (IS_ERR(sec))
> > > > -   return PTR_ERR(sec);
> > > > +   /*
> > > > +* Some targets have a separate gfx mxc rail. So try to read that 
> > > > first and then fall back
> > > > +* to regular mx rail if it is missing
> > > > +*/
> > > > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > > > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > > > +   return -EPROBE_DEFER;
> > > > +   } else if (IS_ERR(sec)) {
> > > > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > > +   if (IS_ERR(sec))
> > > > +   return PTR_ERR(sec);
> > > > +   }
> > >
> > > I assume GMXC would always be used if present, although please use the
> > > approach Dmitry suggested
> >
> > Correct.
> >
> > -Akhil
> > >
> > >
> > > The rest looks good!
> > >
> > > Konrad


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Akhil P Oommen
On Mon, Jun 24, 2024 at 07:28:06AM -0700, Rob Clark wrote:
> On Mon, Jun 24, 2024 at 7:25 AM Rob Clark  wrote:
> >
> > On Sun, Jun 23, 2024 at 4:08 AM Akhil P Oommen  
> > wrote:
> > >
> > > Add support in drm/msm driver for the Adreno X185 gpu found in
> > > Snapdragon X1 Elite chipset.
> > >
> > > Signed-off-by: Akhil P Oommen 
> > > ---
> > >
> > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> > >  drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> > >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> > >  4 files changed, 36 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > > unsigned int state)
> > >  */
> > > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> > >
> > > +   if (adreno_is_x185(adreno_gpu)) {
> > > +   chipid = 0x7050001;
> > > /* NOTE: A730 may also fall in this if-condition with a future 
> > > GMU fw update. */
> > > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > > +   } else if (adreno_is_a7xx(adreno_gpu) && 
> > > !adreno_is_a730(adreno_gpu)) {
> > > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 
> > > 7 */
> > > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> > >
> > > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > > device *dev, u32 *votes,
> > > if (!pri_count)
> > > return -EINVAL;
> > >
> > > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > -   if (IS_ERR(sec))
> > > -   return PTR_ERR(sec);
> > > +   /*
> > > +* Some targets have a separate gfx mxc rail. So try to read that 
> > > first and then fall back
> > > +* to regular mx rail if it is missing
> > > +*/
> > > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > > +   return -EPROBE_DEFER;
> > > +   } else if (IS_ERR(sec)) {
> > > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > +   if (IS_ERR(sec))
> > > +   return PTR_ERR(sec);
> > > +   }
> > >
> > > sec_count >>= 1;
> > > if (!sec_count)
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 973872ad0474..97837f7f2a40 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
> > > count = ARRAY_SIZE(a660_protect);
> > > count_max = 48;
> > > BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
> > > -   } else if (adreno_is_a730(adreno_gpu) ||
> > > -  adreno_is_a740(adreno_gpu) ||
> > > -  adreno_is_a750(adreno_gpu)) {
> > > +   } else if (adreno_is_a7xx(adreno_gpu)) {
> > > regs = a730_protect;
> > > count = ARRAY_SIZE(a730_protect);
> > > count_max = 48;
> > > @@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
> > > gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
> > >
> > > /* Set weights for bicubic filtering */
> > > -   if (adreno_is_a650_family(adreno_gpu)) {
> > > +   if (adreno_is_a650_family(adreno_gpu) || 
> > > adreno_is_x185(adreno_gpu)) {
> > > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
> > > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
> > > 0x3fe05ff4);
> > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > > b/drivers/gpu/drm/msm/ad

Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Akhil P Oommen
On Mon, Jun 24, 2024 at 03:53:48PM +0200, Konrad Dybcio wrote:
> 
> 
> On 6/23/24 13:06, Akhil P Oommen wrote:
> > Add support in drm/msm driver for the Adreno X185 gpu found in
> > Snapdragon X1 Elite chipset.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > 
> >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> >   drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> >   4 files changed, 36 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > unsigned int state)
> >  */
> > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> > +   if (adreno_is_x185(adreno_gpu)) {
> > +   chipid = 0x7050001;
> 
> What's wrong with using the logic below?

patchid is BITS(7, 0), not (15, 8) in the case of x185. Due to the
changes in the chipid scheme within the a7x family, this is a bit
confusing. I will try to improve here in another series.

> 
> > /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
> > update. */
> > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > +   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
> > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > device *dev, u32 *votes,
> > if (!pri_count)
> > return -EINVAL;
> > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > -   if (IS_ERR(sec))
> > -   return PTR_ERR(sec);
> > +   /*
> > +* Some targets have a separate gfx mxc rail. So try to read that first 
> > and then fall back
> > +* to regular mx rail if it is missing
> > +*/
> > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > +   return -EPROBE_DEFER;
> > +   } else if (IS_ERR(sec)) {
> > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > +   if (IS_ERR(sec))
> > +   return PTR_ERR(sec);
> > +   }
> 
> I assume GMXC would always be used if present, although please use the
> approach Dmitry suggested

Correct.

-Akhil
> 
> 
> The rest looks good!
> 
> Konrad


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Akhil P Oommen
On Mon, Jun 24, 2024 at 12:21:30AM +0300, Dmitry Baryshkov wrote:
> On Sun, Jun 23, 2024 at 04:36:29PM GMT, Akhil P Oommen wrote:
> > Add support in drm/msm driver for the Adreno X185 gpu found in
> > Snapdragon X1 Elite chipset.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > 
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> >  drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> >  4 files changed, 36 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > unsigned int state)
> >  */
> > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> >  
> > +   if (adreno_is_x185(adreno_gpu)) {
> > +   chipid = 0x7050001;
> > /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
> > update. */
> > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > +   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
> > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> >  
> > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > device *dev, u32 *votes,
> > if (!pri_count)
> > return -EINVAL;
> >  
> > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > -   if (IS_ERR(sec))
> > -   return PTR_ERR(sec);
> > +   /*
> > +* Some targets have a separate gfx mxc rail. So try to read that first 
> > and then fall back
> > +* to regular mx rail if it is missing
> 
> Can we use compatibles / flags to detect this?

I prefer the current approach so that we don't need to keep adding
checks here for future targets. If gmxc is prefer, we have to use that
in all targets.

> 
> > +*/
> > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > +   return -EPROBE_DEFER;
> > +   } else if (IS_ERR(sec)) {
> > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > +   if (IS_ERR(sec))
> > +   return PTR_ERR(sec);
> > +   }
> 
> The following code might be slightly more idiomatic:
> 
>   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
>   if (IS_ERR(sec) && sec != ERR_PTR(-EPROBE_DEFER))
>   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
>   if (IS_ERR(sec))
>   return PTR_ERR(sec);
>
Ack. This is neater!

> 
> >  
> > sec_count >>= 1;
> > if (!sec_count)
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 973872ad0474..97837f7f2a40 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
> > count = ARRAY_SIZE(a660_protect);
> > count_max = 48;
> > BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
> > -   } else if (adreno_is_a730(adreno_gpu) ||
> > -  adreno_is_a740(adreno_gpu) ||
> > -  adreno_is_a750(adreno_gpu)) {
> > +   } else if (adreno_is_a7xx(adreno_gpu)) {
> > regs = a730_protect;
> > count = ARRAY_SIZE(a730_protect);
> > count_max = 48;
> > @@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
> > gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
> >  
> > /* Set weights for bicubic filtering */
> > -   if (adreno_is_a650_family(adreno_gpu)) {
> > +   if (adreno_is_a650_family(adreno_gpu) || adreno_is_x185(adreno_gpu)) {
> > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
> > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
> > 0x3fe05ff4);
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index c3703a51287b..139c7d828749 100644
> > --- a/drivers/gpu/drm/msm/ad

Re: [PATCH v1 1/3] dt-bindings: display/msm/gmu: Add Adreno X185 GMU

2024-06-26 Thread Akhil P Oommen
On Sun, Jun 23, 2024 at 02:40:14PM +0200, Krzysztof Kozlowski wrote:
> On 23/06/2024 13:06, Akhil P Oommen wrote:
> > Document Adreno X185 GMU in the dt-binding specification.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > 
> >  Documentation/devicetree/bindings/display/msm/gmu.yaml | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
> > b/Documentation/devicetree/bindings/display/msm/gmu.yaml
> > index b3837368a260..9aa7151fd66f 100644
> > --- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
> > +++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
> > @@ -23,6 +23,9 @@ properties:
> >- items:
> >- pattern: '^qcom,adreno-gmu-[67][0-9][0-9]\.[0-9]$'
> >- const: qcom,adreno-gmu
> > +  - items:
> > +  - pattern: '^qcom,adreno-gmu-[x][1-9][0-9][0-9]\.[0-9]$'
> 
> '[x]' is odd. Should be just 'x'.

Ack

-Akhil
> 
> 
> Best regards,
> Krzysztof
> 


Re: [PATCH v2 0/2] Clean up barriers

2024-06-25 Thread Akhil P Oommen
On Tue, Jun 25, 2024 at 08:54:40PM +0200, Konrad Dybcio wrote:
> Changes in v3:
> - Drop the wrapper functions
> - Drop the readback in GMU code
> - Split the commit in two
> 
> Link to v2: 
> https://lore.kernel.org/linux-arm-msm/20240509-topic-adreno-v2-1-b82a9f99b...@linaro.org/
> 
> Changes in v2:
> - Introduce gpu_write_flush() and use it
> - Don't accidentally break a630 by trying to write to non-existent GBIF
> 
> Link to v1: 
> https://lore.kernel.org/linux-arm-msm/20240508-topic-adreno-v1-1-1babd05c1...@linaro.org/
> 
> Signed-off-by: Konrad Dybcio 
> ---
> Konrad Dybcio (2):
>   drm/msm/adreno: De-spaghettify the use of memory barriers
>   Revert "drm/msm/a6xx: Poll for GBIF unhalt status in hw_init"
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  4 +---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++
>  2 files changed, 7 insertions(+), 11 deletions(-)
> ---
> base-commit: 0fc4bfab2cd45f9acb86c4f04b5191e114e901ed
> change-id: 20240625-adreno_barriers-29f356742418

for the whole series:
Reviewed-by: Akhil P Oommen 

-Akhil

> 
> Best regards,
> -- 
> Konrad Dybcio 
> 


Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-25 Thread Akhil P Oommen
On Tue, Jun 25, 2024 at 11:03:42AM -0700, Rob Clark wrote: > On Tue, Jun 25, 
2024 at 10:59 AM Akhil P Oommen  wrote:
> >
> > On Fri, Jun 21, 2024 at 02:09:58PM -0700, Rob Clark wrote:
> > > On Sat, Jun 8, 2024 at 8:44 AM Kiarash Hajian
> > >  wrote:
> > > >
> > > > The driver's memory regions are currently just ioremap()ed, but not
> > > > reserved through a request. That's not a bug, but having the request is
> > > > a little more robust.
> > > >
> > > > Implement the region-request through the corresponding managed
> > > > devres-function.
> > > >
> > > > Signed-off-by: Kiarash Hajian 
> > > > ---
> > > > Changes in v6:
> > > > -Fix compile error
> > > > -Link to v5: 
> > > > https://lore.kernel.org/all/20240607-memory-v1-1-8664f52fc...@gmail.com
> > > >
> > > > Changes in v5:
> > > > - Fix error hanlding problems.
> > > > - Link to v4: 
> > > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v4-1-3881a6408...@gmail.com
> > > >
> > > > Changes in v4:
> > > > - Combine v3 commits into a singel commit
> > > > - Link to v3: 
> > > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v3-0-0a728ad45...@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Remove redundant devm_iounmap calls, relying on devres for 
> > > > automatic resource cleanup.
> > > >
> > > > Changes in v2:
> > > > - update the subject prefix to "drm/msm/a6xx:", to match the 
> > > > majority of other changes to this file.
> > > > ---
> > > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 
> > > > +++--
> > > >  1 file changed, 11 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > index 8bea8ef26f77..d26cc6254ef9 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > @@ -525,7 +525,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > > > bool pdc_in_aop = false;
> > > >
> > > > if (IS_ERR(pdcptr))
> > > > -   goto err;
> > > > +   return;
> > > >
> > > > if (adreno_is_a650(adreno_gpu) ||
> > > > adreno_is_a660_family(adreno_gpu) ||
> > > > @@ -541,7 +541,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > > > if (!pdc_in_aop) {
> > > > seqptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc_seq");
> > > > if (IS_ERR(seqptr))
> > > > -   goto err;
> > > > +   return;
> > > > }
> > > >
> > > > /* Disable SDE clock gating */
> > > > @@ -633,12 +633,6 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu 
> > > > *gmu)
> > > > wmb();
> > > >
> > > > a6xx_rpmh_stop(gmu);
> > > > -
> > > > -err:
> > > > -   if (!IS_ERR_OR_NULL(pdcptr))
> > > > -   iounmap(pdcptr);
> > > > -   if (!IS_ERR_OR_NULL(seqptr))
> > > > -   iounmap(seqptr);
> > > >  }
> > > >
> > > >  /*
> > > > @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> > > > platform_device *pdev,
> > > > return ERR_PTR(-EINVAL);
> > > > }
> > > >
> > > > -   ret = ioremap(res->start, resource_size(res));
> > > > +   ret = devm_ioremap_resource(&pdev->dev, res);
> > >
> > > So, this doesn't actually work, failing in __request_region_locked(),
> > > because the gmu region partially overlaps with the gpucc region (which
> > > is busy).  I think this is intentional, since gmu is controlling the
> > > gpu clocks, etc.  In particular REG_A6XX_GPU_CC_GX_GDSCR is in this
> > > overlapping region.  Maybe Akhil knows more about GMU.
> >
> > We don't really need to map gpucc region from driver on behalf of gmu.
> > Since we don't access any gpucc register from drm-msm driver, we can
> > upd

Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-06-25 Thread Akhil P Oommen
On Tue, Jun 18, 2024 at 10:08:23PM +0530, Akhil P Oommen wrote:
> On Tue, Jun 04, 2024 at 07:35:04PM +0200, Konrad Dybcio wrote:
> > 
> > 
> > On 5/14/24 20:38, Akhil P Oommen wrote:
> > > On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> > > > Memory barriers help ensure instruction ordering, NOT time and order
> > > > of actual write arrival at other observers (e.g. memory-mapped IP).
> > > > On architectures employing weak memory ordering, the latter can be a
> > > > giant pain point, and it has been as part of this driver.
> > > > 
> > > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > > > readl/writel, which include r/w (respectively) barriers.
> > > > 
> > > > Replace the barriers with a readback that ensures the previous writes
> > > > have exited the write buffer (as the CPU must flush the write to the
> > > > register it's trying to read back) and subsequently remove the hack
> > > > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> > > > status in hw_init").
> > > > 
> > > > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in 
> > > > hw_init")
> > > > Signed-off-by: Konrad Dybcio 
> > > > ---
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 --
> > > >   2 files changed, 6 insertions(+), 13 deletions(-)
> > > 
> > > I prefer this version compared to the v2. A helper routine is
> > > unnecessary here because:
> > > 1. there are very few scenarios where we have to read back the same
> > > register.
> > > 2. we may accidently readback a write only register.
> > 
> > Which would still trigger an address dependency on the CPU, no?
> 
> Yes, but it is not a good idea to read a write-only register. We can't be
> sure about its effect on the endpoint.
> 
> > 
> > > 
> > > > 
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > index 0e3dfd4c2bc8..4135a53b55a7 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> > > > int ret;
> > > > u32 val;
> > > > -   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > > > -   /* Wait for the register to finish posting */
> > > > -   wmb();
> > > > +   gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> > > > +   gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> > > 
> > > This is unnecessary because we are polling on a register on the same port 
> > > below. But I think we
> > > can replace "wmb()" above with "mb()" to avoid reordering between read
> > > and write IO instructions.
> > 
> > Ok on the dropping readback part
> > 
> > + AFAIU from Will's response, we can drop the barrier as well

Yes, let drop the the barrier.

> 
> Lets wait a bit on Will's response on compiler reordering.
> 
> > 
> > > 
> > > > ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> > > > val & (1 << 1), 100, 1);
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > index 973872ad0474..0acbc38b8e70 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
> > > > }
> > > > /* Clear GBIF halt in case GX domain was not collapsed */
> > > > +   gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > > 
> > > We need a full barrier here to avoid reordering. Also, lets add a
> > > comment about why we are doing this odd looking sequence.

Please ignore this.

> > > 
> > > > +   gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > > > if (adreno_is_a619_holi(adreno_gpu)) {
> > > > -   gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > > > gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > > > -   /* Let's make extr

Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-06-25 Thread Akhil P Oommen
On Thu, Jun 20, 2024 at 02:04:01PM +0100, Will Deacon wrote:
> On Tue, Jun 18, 2024 at 09:41:58PM +0530, Akhil P Oommen wrote:
> > On Tue, Jun 04, 2024 at 03:40:56PM +0100, Will Deacon wrote:
> > > On Thu, May 16, 2024 at 01:55:26PM -0500, Andrew Halaney wrote:
> > > > On Thu, May 16, 2024 at 08:20:05PM GMT, Akhil P Oommen wrote:
> > > > > On Thu, May 16, 2024 at 08:15:34AM -0500, Andrew Halaney wrote:
> > > > > > If I understand correctly, you don't need any memory barrier.
> > > > > > writel()/readl()'s are ordered to the same endpoint. That goes for 
> > > > > > all
> > > > > > the reordering/barrier comments mentioned below too.
> > > > > > 
> > > > > > device-io.rst:
> > > > > > 
> > > > > > The read and write functions are defined to be ordered. That is 
> > > > > > the
> > > > > > compiler is not permitted to reorder the I/O sequence. When the 
> > > > > > ordering
> > > > > > can be compiler optimised, you can use __readb() and friends to
> > > > > > indicate the relaxed ordering. Use this with care.
> > > > > > 
> > > > > > memory-barriers.txt:
> > > > > > 
> > > > > >  (*) readX(), writeX():
> > > > > > 
> > > > > > The readX() and writeX() MMIO accessors take a pointer to 
> > > > > > the
> > > > > > peripheral being accessed as an __iomem * parameter. For 
> > > > > > pointers
> > > > > > mapped with the default I/O attributes (e.g. those returned 
> > > > > > by
> > > > > > ioremap()), the ordering guarantees are as follows:
> > > > > > 
> > > > > > 1. All readX() and writeX() accesses to the same peripheral 
> > > > > > are ordered
> > > > > >with respect to each other. This ensures that MMIO 
> > > > > > register accesses
> > > > > >by the same CPU thread to a particular device will 
> > > > > > arrive in program
> > > > > >order.
> > > > > > 
> > > > > 
> > > > > In arm64, a writel followed by readl translates to roughly the 
> > > > > following
> > > > > sequence: dmb_wmb(), __raw_writel(), __raw_readl(), dmb_rmb(). I am 
> > > > > not
> > > > > sure what is stopping compiler from reordering  __raw_writel() and 
> > > > > __raw_readl()
> > > > > above? I am assuming iomem cookie is ignored during compilation.
> > > > 
> > > > It seems to me that is due to some usage of volatile there in
> > > > __raw_writel() etc, but to be honest after reading about volatile and
> > > > some threads from gcc mailing lists, I don't have a confident answer :)
> > > > 
> > > > > 
> > > > > Added Will to this thread if he can throw some light on this.
> > > > 
> > > > Hopefully Will can school us.
> > > 
> > > The ordering in this case is ensured by the memory attributes used for
> > > ioremap(). When an MMIO region is mapped using Device-nGnRE attributes
> > > (as it the case for ioremap()), the "nR" part means "no reordering", so
> > > readX() and writeX() to that region are ordered wrt each other.
> > 
> > But that avoids only HW reordering, doesn't it? What about *compiler 
> > reordering* in the
> > case of a writel following by a readl which translates to:
> > 1: dmb_wmb()
> > 2: __raw_writel() -> roughly "asm volatile('str')
> > 3: __raw_readl() -> roughly "asm volatile('ldr')
> > 4: dmb_rmb()
> > 
> > Is the 'volatile' keyword sufficient to avoid reordering between (2) and 
> > (3)? Or
> > do we need a "memory" clobber to inhibit reordering?
> > 
> > This is still not clear to me even after going through some compiler 
> > documentions.
> 
> I don't think the compiler should reorder volatile asm blocks wrt each
> other.
> 

Thanks Will for confirmation.

-Akhil.

> Will


Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-25 Thread Akhil P Oommen
On Fri, Jun 21, 2024 at 02:09:58PM -0700, Rob Clark wrote:
> On Sat, Jun 8, 2024 at 8:44 AM Kiarash Hajian
>  wrote:
> >
> > The driver's memory regions are currently just ioremap()ed, but not
> > reserved through a request. That's not a bug, but having the request is
> > a little more robust.
> >
> > Implement the region-request through the corresponding managed
> > devres-function.
> >
> > Signed-off-by: Kiarash Hajian 
> > ---
> > Changes in v6:
> > -Fix compile error
> > -Link to v5: 
> > https://lore.kernel.org/all/20240607-memory-v1-1-8664f52fc...@gmail.com
> >
> > Changes in v5:
> > - Fix error hanlding problems.
> > - Link to v4: 
> > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v4-1-3881a6408...@gmail.com
> >
> > Changes in v4:
> > - Combine v3 commits into a singel commit
> > - Link to v3: 
> > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v3-0-0a728ad45...@gmail.com
> >
> > Changes in v3:
> > - Remove redundant devm_iounmap calls, relying on devres for automatic 
> > resource cleanup.
> >
> > Changes in v2:
> > - update the subject prefix to "drm/msm/a6xx:", to match the majority 
> > of other changes to this file.
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 
> > +++--
> >  1 file changed, 11 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 8bea8ef26f77..d26cc6254ef9 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -525,7 +525,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > bool pdc_in_aop = false;
> >
> > if (IS_ERR(pdcptr))
> > -   goto err;
> > +   return;
> >
> > if (adreno_is_a650(adreno_gpu) ||
> > adreno_is_a660_family(adreno_gpu) ||
> > @@ -541,7 +541,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > if (!pdc_in_aop) {
> > seqptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc_seq");
> > if (IS_ERR(seqptr))
> > -   goto err;
> > +   return;
> > }
> >
> > /* Disable SDE clock gating */
> > @@ -633,12 +633,6 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > wmb();
> >
> > a6xx_rpmh_stop(gmu);
> > -
> > -err:
> > -   if (!IS_ERR_OR_NULL(pdcptr))
> > -   iounmap(pdcptr);
> > -   if (!IS_ERR_OR_NULL(seqptr))
> > -   iounmap(seqptr);
> >  }
> >
> >  /*
> > @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> > platform_device *pdev,
> > return ERR_PTR(-EINVAL);
> > }
> >
> > -   ret = ioremap(res->start, resource_size(res));
> > +   ret = devm_ioremap_resource(&pdev->dev, res);
> 
> So, this doesn't actually work, failing in __request_region_locked(),
> because the gmu region partially overlaps with the gpucc region (which
> is busy).  I think this is intentional, since gmu is controlling the
> gpu clocks, etc.  In particular REG_A6XX_GPU_CC_GX_GDSCR is in this
> overlapping region.  Maybe Akhil knows more about GMU.

We don't really need to map gpucc region from driver on behalf of gmu.
Since we don't access any gpucc register from drm-msm driver, we can
update the range size to correct this. But due to backward compatibility
requirement with older dt, can we still enable region locking? I prefer
it if that is possible.

FYI, kgsl accesses gpucc registers to ensure gdsc has collapsed. So
gpucc region has to be mapped by kgsl and that is reflected in the kgsl
device tree.

-Akhil

> 
> BR,
> -R
> 
> > if (!ret) {
> > DRM_DEV_ERROR(&pdev->dev, "Unable to map the %s 
> > registers\n", name);
> > return ERR_PTR(-EINVAL);
> > @@ -1613,13 +1607,13 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu 
> > *a6xx_gpu, struct device_node *node)
> > gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> > if (IS_ERR(gmu->mmio)) {
> > ret = PTR_ERR(gmu->mmio);
> > -   goto err_mmio;
> > +   goto err_cleanup;
> > }
> >
> > gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> > if (IS_ERR(gmu->cxpd)) {
> > ret = PTR_ERR(gmu->cxpd);
> > -   goto err_mmio;
> > +   goto err_cleanup;
> > }
> >
> > if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> > @@ -1635,7 +1629,7 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, 
> > struct device_node *node)
> > gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> > if (IS_ERR(gmu->gxpd)) {
> > ret = PTR_ERR(gmu->gxpd);
> > -   goto err_mmio;
> > +   goto err_cleanup;
> > }
> >
> > gmu->initialized = true;
> > @@ -1645,9 +1639,7 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, 

Re: [PATCH v1 0/3] Support for Adreno X1-85 GPU

2024-06-23 Thread Akhil P Oommen
On Sun, Jun 23, 2024 at 01:11:48PM +0200, Krzysztof Kozlowski wrote:
> On 23/06/2024 13:06, Akhil P Oommen wrote:
> > This series adds support for the Adreno X1-85 GPU found in Qualcomm's
> > compute series chipset, Snapdragon X1 Elite (x1e80100). In this new
> > naming scheme for Adreno GPU, 'X' stands for compute series, '1' denotes
> > 1st generation and '8' & '5' denotes the tier and the SKU which it
> > belongs.
> > 
> > X1-85 has major focus on doubling core clock frequency and bandwidth
> > throughput. It has a dedicated collapsible Graphics MX rail (gmxc) to
> > power the memories and double the number of data channels to improve
> > bandwidth to DDR.
> > 
> > Mesa has the necessary bits present already to support this GPU. We are
> > able to bring up Gnome desktop by hardcoding "0x43050a01" as
> > chipid. Also, verified glxgears and glmark2. We have plans to add the
> > new chipid support to Mesa in next few weeks, but these patches can go in
> > right away to get included in v6.11.
> > 
> > This series is rebased on top of v6.10-rc4. P3 cherry-picks cleanly on
> > qcom/for-next.
> > 
> > P1 & P2 for Rob, P3 for Bjorn to pick up.
> 
> Which Rob?

Sorry for the confusion! I meant Rob Clark whom I had added in the "To:"
list.

-Akhil

> 
> Why bindings cannot go as usual way - via the subsystem?
> 
> Best regards,
> Krzysztof
> 
> 


Re: [PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-23 Thread Akhil P Oommen
On Sun, Jun 23, 2024 at 03:40:06PM -0500, Bjorn Andersson wrote:
> On Sun, Jun 23, 2024 at 08:46:30PM GMT, Akhil P Oommen wrote:
> > On Sun, Jun 23, 2024 at 02:53:17PM +0200, Krzysztof Kozlowski wrote:
> > > On 23/06/2024 14:28, Akhil P Oommen wrote:
> > > > On Sun, Jun 23, 2024 at 01:17:16PM +0200, Krzysztof Kozlowski wrote:
> > > >> On 23/06/2024 13:06, Akhil P Oommen wrote:
> > > >>> Add the necessary dt nodes for gpu support in X1E80100.
> > > >>>
> > > >>> Signed-off-by: Akhil P Oommen 
> > > >>> ---
> > > >>> + gmu: gmu@3d6a000 {
> > > >>> + compatible = "qcom,adreno-gmu-x185.1", 
> > > >>> "qcom,adreno-gmu";
> > > >>> + reg = <0x0 0x03d5 0x0 0x1>,
> > > >>> +   <0x0 0x03d6a000 0x0 0x35000>,
> > > >>> +   <0x0 0x0b28 0x0 0x1>;
> > > >>> + reg-names =  "rscc", "gmu", "gmu_pdc";
> > > >>
> > > >> Really, please start testing your patches. Your internal instructions
> > > >> tells you to do that, so please follow it carefully. Don't use the
> > > >> community as the tool, because you do not want to run checks and
> > > >> investigate results.
> > > > 
> > > > This was obviously tested before (and retested now) and everything 
> > > > works. I am
> > > > confused about what you meant. Could you please elaborate a bit? The 
> > > > device
> > > > and the compilation/test setup is new for me, so I am wondering if I
> > > > made any silly mistake!
> > > 
> > > Eh, your DTS is not correct, but this could not be pointed out by tests,
> > > because the binding does not work. :(
> > 
> > I reordered both "reg" and "reg-names" arrays based on the address.
> 
> The @3d6a000 should match the first reg entry.
> 
> > Not sure if
> > that is what we are talking about here. Gpu driver uses 
> > platform_get_resource_byname()
> > to query mmio resources.
> > 
> > I will retest dt-bindings and dts checks after picking the patches you
> > just posted and report back. Is the schema supposed to enforce strict
> > order?
> 
> In your second hunk in patch 1, you are defining the order of reg,
> reg-names, clocks, and clock-names. This creates an ABI between DTB and
> implementation where ordering is significant - regardless of Linux using
> platform_get_resource_byname().

I will revert this to the original order. Thanks for the clarification, 
Bjorn/Krzysztof.

-Akhil.
> 
> Regards,
> Bjorn
> 
> > 
> > -Akhil.
> > > 
> > > I'll fix up the binding and then please test on top of my patch (see
> > > your internal guideline about necessary tests before sending any binding
> > > or DTS patch).
> > > 
> > > Best regards,
> > > Krzysztof
> > > 


Re: [PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-23 Thread Akhil P Oommen
On Sun, Jun 23, 2024 at 02:53:17PM +0200, Krzysztof Kozlowski wrote:
> On 23/06/2024 14:28, Akhil P Oommen wrote:
> > On Sun, Jun 23, 2024 at 01:17:16PM +0200, Krzysztof Kozlowski wrote:
> >> On 23/06/2024 13:06, Akhil P Oommen wrote:
> >>> Add the necessary dt nodes for gpu support in X1E80100.
> >>>
> >>> Signed-off-by: Akhil P Oommen 
> >>> ---
> >>> + gmu: gmu@3d6a000 {
> >>> + compatible = "qcom,adreno-gmu-x185.1", 
> >>> "qcom,adreno-gmu";
> >>> + reg = <0x0 0x03d5 0x0 0x1>,
> >>> +   <0x0 0x03d6a000 0x0 0x35000>,
> >>> +   <0x0 0x0b28 0x0 0x1>;
> >>> + reg-names =  "rscc", "gmu", "gmu_pdc";
> >>
> >> Really, please start testing your patches. Your internal instructions
> >> tells you to do that, so please follow it carefully. Don't use the
> >> community as the tool, because you do not want to run checks and
> >> investigate results.
> > 
> > This was obviously tested before (and retested now) and everything works. I 
> > am
> > confused about what you meant. Could you please elaborate a bit? The device
> > and the compilation/test setup is new for me, so I am wondering if I
> > made any silly mistake!
> 
> Eh, your DTS is not correct, but this could not be pointed out by tests,
> because the binding does not work. :(

I reordered both "reg" and "reg-names" arrays based on the address. Not sure if
that is what we are talking about here. Gpu driver uses 
platform_get_resource_byname()
to query mmio resources.

I will retest dt-bindings and dts checks after picking the patches you
just posted and report back. Is the schema supposed to enforce strict
order?

-Akhil.
> 
> I'll fix up the binding and then please test on top of my patch (see
> your internal guideline about necessary tests before sending any binding
> or DTS patch).
> 
> Best regards,
> Krzysztof
> 


Re: [PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-23 Thread Akhil P Oommen
On Sun, Jun 23, 2024 at 01:17:16PM +0200, Krzysztof Kozlowski wrote:
> On 23/06/2024 13:06, Akhil P Oommen wrote:
> > Add the necessary dt nodes for gpu support in X1E80100.
> > 
> > Signed-off-by: Akhil P Oommen 
> > ---
> > +   gmu: gmu@3d6a000 {
> > +   compatible = "qcom,adreno-gmu-x185.1", 
> > "qcom,adreno-gmu";
> > +   reg = <0x0 0x03d5 0x0 0x1>,
> > + <0x0 0x03d6a000 0x0 0x35000>,
> > + <0x0 0x0b28 0x0 0x1>;
> > +   reg-names =  "rscc", "gmu", "gmu_pdc";
> 
> Really, please start testing your patches. Your internal instructions
> tells you to do that, so please follow it carefully. Don't use the
> community as the tool, because you do not want to run checks and
> investigate results.

This was obviously tested before (and retested now) and everything works. I am
confused about what you meant. Could you please elaborate a bit? The device
and the compilation/test setup is new for me, so I am wondering if I
made any silly mistake!

-Akhil.

> 
> NAK.
> 
> Best regards,
> Krzysztof
> 


[PATCH v1 3/3] arm64: dts: qcom: x1e80100: Add gpu support

2024-06-23 Thread Akhil P Oommen
Add the necessary dt nodes for gpu support in X1E80100.

Signed-off-by: Akhil P Oommen 
---

 arch/arm64/boot/dts/qcom/x1e80100.dtsi | 195 +
 1 file changed, 195 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi 
b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index 5f90a0b3c016..3e887286bab4 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2985,6 +2986,200 @@ tcsr: clock-controller@1fc {
#reset-cells = <1>;
};
 
+   gpu: gpu@3d0 {
+   compatible = "qcom,adreno-43050c01", "qcom,adreno";
+   reg = <0x0 0x03d0 0x0 0x4>,
+ <0x0 0x03d61000 0x0 0x800>,
+ <0x0 0x03d9e000 0x0 0x1000>;
+
+   reg-names = "kgsl_3d0_reg_memory",
+   "cx_dbgc",
+   "cx_mem";
+
+   interrupts = ;
+
+   iommus = <&adreno_smmu 0 0x0>,
+<&adreno_smmu 1 0x0>;
+
+   operating-points-v2 = <&gpu_opp_table>;
+
+   qcom,gmu = <&gmu>;
+   #cooling-cells = <2>;
+
+   interconnects = <&gem_noc MASTER_GFX3D 0 &mc_virt 
SLAVE_EBI1 0>;
+   interconnect-names = "gfx-mem";
+
+   zap-shader {
+   memory-region = <&gpu_microcode_mem>;
+   firmware-name = "qcom/gen70500_zap.mbn";
+   };
+
+   gpu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-11 {
+   opp-hz = /bits/ 64 <11>;
+   opp-level = 
;
+   opp-peak-kBps = <1650>;
+   };
+
+   opp-10 {
+   opp-hz = /bits/ 64 <10>;
+   opp-level = 
;
+   opp-peak-kBps = <14398438>;
+   };
+
+   opp-92500 {
+   opp-hz = /bits/ 64 <92500>;
+   opp-level = 
;
+   opp-peak-kBps = <14398438>;
+   };
+
+   opp-8 {
+   opp-hz = /bits/ 64 <8>;
+   opp-level = ;
+   opp-peak-kBps = <12449219>;
+   };
+
+   opp-74400 {
+   opp-hz = /bits/ 64 <74400>;
+   opp-level = 
;
+   opp-peak-kBps = <10687500>;
+   };
+
+   opp-68700 {
+   opp-hz = /bits/ 64 <68700>;
+   opp-level = 
;
+   opp-peak-kBps = <8171875>;
+   };
+
+   opp-55000 {
+   opp-hz = /bits/ 64 <55000>;
+   opp-level = ;
+   opp-peak-kBps = <6074219>;
+   };
+
+   opp-39000 {
+   opp-hz = /bits/ 64 <39000>;
+   opp-level = 
;
+   opp-peak-kBps = <300>;
+   };
+
+   opp-3 {
+   opp-hz = /bits/ 64 <3>;
+   opp-level = 
;
+   opp-peak-kBps = <2136719>;
+   };
+   };
+   };
+
+   gmu: gmu@3d6a000 {
+   compatible = "qcom,adreno-gmu-x185.1", 
"qcom,adreno-gmu";
+   reg = <0x0 0x03d5 0x0 0x1>,
+ <0x0 0x03d6a000 0x0 0x35000>,
+ 

[PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-23 Thread Akhil P Oommen
Add support in drm/msm driver for the Adreno X185 gpu found in
Snapdragon X1 Elite chipset.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
 drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
 4 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 0e3dfd4c2bc8..168a4bddfaf2 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
 */
gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
 
+   if (adreno_is_x185(adreno_gpu)) {
+   chipid = 0x7050001;
/* NOTE: A730 may also fall in this if-condition with a future GMU fw 
update. */
-   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
+   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
/* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
 
@@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device 
*dev, u32 *votes,
if (!pri_count)
return -EINVAL;
 
-   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
-   if (IS_ERR(sec))
-   return PTR_ERR(sec);
+   /*
+* Some targets have a separate gfx mxc rail. So try to read that first 
and then fall back
+* to regular mx rail if it is missing
+*/
+   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
+   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
+   return -EPROBE_DEFER;
+   } else if (IS_ERR(sec)) {
+   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
+   if (IS_ERR(sec))
+   return PTR_ERR(sec);
+   }
 
sec_count >>= 1;
if (!sec_count)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 973872ad0474..97837f7f2a40 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
count = ARRAY_SIZE(a660_protect);
count_max = 48;
BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
-   } else if (adreno_is_a730(adreno_gpu) ||
-  adreno_is_a740(adreno_gpu) ||
-  adreno_is_a750(adreno_gpu)) {
+   } else if (adreno_is_a7xx(adreno_gpu)) {
regs = a730_protect;
count = ARRAY_SIZE(a730_protect);
count_max = 48;
@@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
 
/* Set weights for bicubic filtering */
-   if (adreno_is_a650_family(adreno_gpu)) {
+   if (adreno_is_a650_family(adreno_gpu) || adreno_is_x185(adreno_gpu)) {
gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
0x3fe05ff4);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index c3703a51287b..139c7d828749 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -568,6 +568,20 @@ static const struct adreno_info gpulist[] = {
.zapfw = "a740_zap.mdt",
.hwcg = a740_hwcg,
.address_space_size = SZ_16G,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" */
+   .family = ADRENO_7XX_GEN2,
+   .fw = {
+   [ADRENO_FW_SQE] = "gen70500_sqe.fw",
+   [ADRENO_FW_GMU] = "gen70500_gmu.bin",
+   },
+   .gmem = 3 * SZ_1M,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
+ ADRENO_QUIRK_HAS_HW_APRIV,
+   .init = a6xx_gpu_init,
+   .hwcg = a740_hwcg,
+   .address_space_size = SZ_16G,
}, {
.chip_ids = ADRENO_CHIP_IDS(0x43051401), /* "C520v2" */
.family = ADRENO_7XX_GEN3,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 77526892eb8c..d9ea8e0f6ad5 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -448,6 +448,11 @@ static inline int adreno_is_a750(struct adreno_gpu *gpu)
return gpu->info->chip_ids[0] == 0x4305

[PATCH v1 1/3] dt-bindings: display/msm/gmu: Add Adreno X185 GMU

2024-06-23 Thread Akhil P Oommen
Document Adreno X185 GMU in the dt-binding specification.

Signed-off-by: Akhil P Oommen 
---

 Documentation/devicetree/bindings/display/msm/gmu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index b3837368a260..9aa7151fd66f 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -23,6 +23,9 @@ properties:
   - items:
   - pattern: '^qcom,adreno-gmu-[67][0-9][0-9]\.[0-9]$'
   - const: qcom,adreno-gmu
+  - items:
+  - pattern: '^qcom,adreno-gmu-[x][1-9][0-9][0-9]\.[0-9]$'
+  - const: qcom,adreno-gmu
   - const: qcom,adreno-gmu-wrapper
 
   reg:
@@ -225,6 +228,7 @@ allOf:
   - qcom,adreno-gmu-730.1
   - qcom,adreno-gmu-740.1
   - qcom,adreno-gmu-750.1
+  - qcom,adreno-gmu-x185.1
 then:
   properties:
 reg:
-- 
2.45.1



[PATCH v1 0/3] Support for Adreno X1-85 GPU

2024-06-23 Thread Akhil P Oommen
This series adds support for the Adreno X1-85 GPU found in Qualcomm's
compute series chipset, Snapdragon X1 Elite (x1e80100). In this new
naming scheme for Adreno GPU, 'X' stands for compute series, '1' denotes
1st generation and '8' & '5' denotes the tier and the SKU which it
belongs.

X1-85 has major focus on doubling core clock frequency and bandwidth
throughput. It has a dedicated collapsible Graphics MX rail (gmxc) to
power the memories and double the number of data channels to improve
bandwidth to DDR.

Mesa has the necessary bits present already to support this GPU. We are
able to bring up Gnome desktop by hardcoding "0x43050a01" as
chipid. Also, verified glxgears and glmark2. We have plans to add the
new chipid support to Mesa in next few weeks, but these patches can go in
right away to get included in v6.11.

This series is rebased on top of v6.10-rc4. P3 cherry-picks cleanly on
qcom/for-next.

P1 & P2 for Rob, P3 for Bjorn to pick up.


Akhil P Oommen (3):
  dt-bindings: display/msm/gmu: Add Adreno X185 GMU
  drm/msm/adreno: Add support for X185 GPU
  arm64: dts: qcom: x1e80100: Add gpu support

 .../devicetree/bindings/display/msm/gmu.yaml  |   4 +
 arch/arm64/boot/dts/qcom/x1e80100.dtsi| 195 ++
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  19 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |   6 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c|  14 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
 6 files changed, 235 insertions(+), 8 deletions(-)

-- 
2.45.1



Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-06-18 Thread Akhil P Oommen
On Tue, Jun 04, 2024 at 07:35:04PM +0200, Konrad Dybcio wrote:
> 
> 
> On 5/14/24 20:38, Akhil P Oommen wrote:
> > On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> > > Memory barriers help ensure instruction ordering, NOT time and order
> > > of actual write arrival at other observers (e.g. memory-mapped IP).
> > > On architectures employing weak memory ordering, the latter can be a
> > > giant pain point, and it has been as part of this driver.
> > > 
> > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > > readl/writel, which include r/w (respectively) barriers.
> > > 
> > > Replace the barriers with a readback that ensures the previous writes
> > > have exited the write buffer (as the CPU must flush the write to the
> > > register it's trying to read back) and subsequently remove the hack
> > > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> > > status in hw_init").
> > > 
> > > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in 
> > > hw_init")
> > > Signed-off-by: Konrad Dybcio 
> > > ---
> > >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 --
> > >   2 files changed, 6 insertions(+), 13 deletions(-)
> > 
> > I prefer this version compared to the v2. A helper routine is
> > unnecessary here because:
> > 1. there are very few scenarios where we have to read back the same
> > register.
> > 2. we may accidently readback a write only register.
> 
> Which would still trigger an address dependency on the CPU, no?

Yes, but it is not a good idea to read a write-only register. We can't be
sure about its effect on the endpoint.

> 
> > 
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 0e3dfd4c2bc8..4135a53b55a7 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> > >   int ret;
> > >   u32 val;
> > > - gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > > - /* Wait for the register to finish posting */
> > > - wmb();
> > > + gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> > > + gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> > 
> > This is unnecessary because we are polling on a register on the same port 
> > below. But I think we
> > can replace "wmb()" above with "mb()" to avoid reordering between read
> > and write IO instructions.
> 
> Ok on the dropping readback part
> 
> + AFAIU from Will's response, we can drop the barrier as well

Lets wait a bit on Will's response on compiler reordering.

> 
> > 
> > >   ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> > >   val & (1 << 1), 100, 1);
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 973872ad0474..0acbc38b8e70 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
> > >   }
> > >   /* Clear GBIF halt in case GX domain was not collapsed */
> > > + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > 
> > We need a full barrier here to avoid reordering. Also, lets add a
> > comment about why we are doing this odd looking sequence.
> > 
> > > + gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > >   if (adreno_is_a619_holi(adreno_gpu)) {
> > > - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > >   gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > > - /* Let's make extra sure that the GPU can access the memory.. */
> > > - mb();
> > 
> > We need a full barrier here.
> > 
> > > + gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
> > >   } else if (a6xx_has_gbif(adreno_gpu)) {
> > > - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > >   gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > > - /* Let's make extra sure that the GPU can access the memory.. */
> > > - mb();
> > 
> > We need a full barrier here.
> 
> N

Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-06-18 Thread Akhil P Oommen
On Tue, Jun 04, 2024 at 03:40:56PM +0100, Will Deacon wrote:
> On Thu, May 16, 2024 at 01:55:26PM -0500, Andrew Halaney wrote:
> > On Thu, May 16, 2024 at 08:20:05PM GMT, Akhil P Oommen wrote:
> > > On Thu, May 16, 2024 at 08:15:34AM -0500, Andrew Halaney wrote:
> > > > If I understand correctly, you don't need any memory barrier.
> > > > writel()/readl()'s are ordered to the same endpoint. That goes for all
> > > > the reordering/barrier comments mentioned below too.
> > > > 
> > > > device-io.rst:
> > > > 
> > > > The read and write functions are defined to be ordered. That is the
> > > > compiler is not permitted to reorder the I/O sequence. When the 
> > > > ordering
> > > > can be compiler optimised, you can use __readb() and friends to
> > > > indicate the relaxed ordering. Use this with care.
> > > > 
> > > > memory-barriers.txt:
> > > > 
> > > >  (*) readX(), writeX():
> > > > 
> > > > The readX() and writeX() MMIO accessors take a pointer to 
> > > > the
> > > > peripheral being accessed as an __iomem * parameter. For 
> > > > pointers
> > > > mapped with the default I/O attributes (e.g. those returned 
> > > > by
> > > > ioremap()), the ordering guarantees are as follows:
> > > > 
> > > > 1. All readX() and writeX() accesses to the same peripheral 
> > > > are ordered
> > > >with respect to each other. This ensures that MMIO 
> > > > register accesses
> > > >by the same CPU thread to a particular device will 
> > > > arrive in program
> > > >order.
> > > > 
> > > 
> > > In arm64, a writel followed by readl translates to roughly the following
> > > sequence: dmb_wmb(), __raw_writel(), __raw_readl(), dmb_rmb(). I am not
> > > sure what is stopping compiler from reordering  __raw_writel() and 
> > > __raw_readl()
> > > above? I am assuming iomem cookie is ignored during compilation.
> > 
> > It seems to me that is due to some usage of volatile there in
> > __raw_writel() etc, but to be honest after reading about volatile and
> > some threads from gcc mailing lists, I don't have a confident answer :)
> > 
> > > 
> > > Added Will to this thread if he can throw some light on this.
> > 
> > Hopefully Will can school us.
> 
> The ordering in this case is ensured by the memory attributes used for
> ioremap(). When an MMIO region is mapped using Device-nGnRE attributes
> (as it the case for ioremap()), the "nR" part means "no reordering", so
> readX() and writeX() to that region are ordered wrt each other.

But that avoids only HW reordering, doesn't it? What about *compiler 
reordering* in the
case of a writel following by a readl which translates to:
1: dmb_wmb()
2: __raw_writel() -> roughly "asm volatile('str')
3: __raw_readl() -> roughly "asm volatile('ldr')
4: dmb_rmb()

Is the 'volatile' keyword sufficient to avoid reordering between (2) and (3)? Or
do we need a "memory" clobber to inhibit reordering?

This is still not clear to me even after going through some compiler 
documentions.

-Akhil.

> 
> Note that guarantee _doesn't_ apply to other flavours of ioremap(), so
> e.g. ioremap_wc() won't give you the ordering.
> 
> Hope that helps,
> 
> Will


Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-05-16 Thread Akhil P Oommen
On Thu, May 16, 2024 at 08:15:34AM -0500, Andrew Halaney wrote:
> On Wed, May 15, 2024 at 12:08:49AM GMT, Akhil P Oommen wrote:
> > On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> > > Memory barriers help ensure instruction ordering, NOT time and order
> > > of actual write arrival at other observers (e.g. memory-mapped IP).
> > > On architectures employing weak memory ordering, the latter can be a
> > > giant pain point, and it has been as part of this driver.
> > > 
> > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > > readl/writel, which include r/w (respectively) barriers.
> > > 
> > > Replace the barriers with a readback that ensures the previous writes
> > > have exited the write buffer (as the CPU must flush the write to the
> > > register it's trying to read back) and subsequently remove the hack
> > > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> > > status in hw_init").
> 
> For what its worth, I've been eyeing (but haven't tested) sending some
> patches to clean up dsi_phy_write_udelay/ndelay(). There's no ordering
> guarantee between a writel() and a delay(), so the expected "write then
> delay" sequence might not be happening.. you need to write, read, delay.
> 
> memory-barriers.txt:
> 
>   5. A readX() by a CPU thread from the peripheral will complete before
>  any subsequent delay() loop can begin execution on the same thread.
>  This ensures that two MMIO register writes by the CPU to a peripheral
>  will arrive at least 1us apart if the first write is immediately read
>  back with readX() and udelay(1) is called prior to the second
>  writeX():
> 
>   writel(42, DEVICE_REGISTER_0); // Arrives at the device...
>   readl(DEVICE_REGISTER_0);
>   udelay(1);
>   writel(42, DEVICE_REGISTER_1); // ...at least 1us before this.

Yes, udelay orders only with readl(). I saw a patch from Will Deacon
which fixes this for arm64 few years back:
https://lore.kernel.org/all/1543251228-30001-1-git-send-email-will.dea...@arm.com/T/

But this is needed only when you write io and do cpuside wait , not when
you poll io to check status.

> 
> > > 
> > > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in 
> > > hw_init")
> > > Signed-off-by: Konrad Dybcio 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 --
> > >  2 files changed, 6 insertions(+), 13 deletions(-)
> > 
> > I prefer this version compared to the v2. A helper routine is
> > unnecessary here because:
> > 1. there are very few scenarios where we have to read back the same
> > register.
> > 2. we may accidently readback a write only register.
> > 
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 0e3dfd4c2bc8..4135a53b55a7 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> > >   int ret;
> > >   u32 val;
> > >  
> > > - gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > > - /* Wait for the register to finish posting */
> > > - wmb();
> > > + gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> > > + gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> > 
> > This is unnecessary because we are polling on a register on the same port 
> > below. But I think we
> > can replace "wmb()" above with "mb()" to avoid reordering between read
> > and write IO instructions.
> 
> If I understand correctly, you don't need any memory barrier.
> writel()/readl()'s are ordered to the same endpoint. That goes for all
> the reordering/barrier comments mentioned below too.
> 
> device-io.rst:
> 
> The read and write functions are defined to be ordered. That is the
> compiler is not permitted to reorder the I/O sequence. When the ordering
> can be compiler optimised, you can use __readb() and friends to
> indicate the relaxed ordering. Use this with care.
> 
> memory-barriers.txt:
> 
>  (*) readX(), writeX():
> 
>   The readX() and writeX() MMIO accessors take a pointer to the
>   peripheral being accessed as an __iomem * parameter. For pointers
>   mapped with the defa

Re: [PATCH] drm/msm: Add obj flags to gpu devcoredump

2024-05-14 Thread Akhil P Oommen
On Mon, May 13, 2024 at 08:51:47AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> When debugging faults, it is useful to know how the BO is mapped (cached
> vs WC, gpu readonly, etc).
> 
> Signed-off-by: Rob Clark 

Reviewed-by: Akhil P Oommen 

-Akhil

> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 1 +
>  drivers/gpu/drm/msm/msm_gpu.c   | 6 --
>  drivers/gpu/drm/msm/msm_gpu.h   | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index b7bbef2eeff4..d9ea15994ae9 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -887,6 +887,7 @@ void adreno_show(struct msm_gpu *gpu, struct 
> msm_gpu_state *state,
>   drm_printf(p, "  - iova: 0x%016llx\n",
>   state->bos[i].iova);
>   drm_printf(p, "size: %zd\n", state->bos[i].size);
> + drm_printf(p, "flags: 0x%x\n", state->bos[i].flags);
>   drm_printf(p, "name: %-32s\n", state->bos[i].name);
>  
>   adreno_show_object(p, &state->bos[i].data,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index d14ec058906f..ceaee23a4d22 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -222,14 +222,16 @@ static void msm_gpu_crashstate_get_bo(struct 
> msm_gpu_state *state,
>   struct drm_gem_object *obj, u64 iova, bool full)
>  {
>   struct msm_gpu_state_bo *state_bo = &state->bos[state->nr_bos];
> + struct msm_gem_object *msm_obj = to_msm_bo(obj);
>  
>   /* Don't record write only objects */
>   state_bo->size = obj->size;
> + state_bo->flags = msm_obj->flags;
>   state_bo->iova = iova;
>  
> - BUILD_BUG_ON(sizeof(state_bo->name) != sizeof(to_msm_bo(obj)->name));
> + BUILD_BUG_ON(sizeof(state_bo->name) != sizeof(msm_obj->name));
>  
> - memcpy(state_bo->name, to_msm_bo(obj)->name, sizeof(state_bo->name));
> + memcpy(state_bo->name, msm_obj->name, sizeof(state_bo->name));
>  
>   if (full) {
>   void *ptr;
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 685470b84708..05bb247e7210 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -527,6 +527,7 @@ struct msm_gpu_submitqueue {
>  struct msm_gpu_state_bo {
>   u64 iova;
>   size_t size;
> + u32 flags;
>   void *data;
>   bool encoded;
>   char name[32];
> -- 
> 2.45.0
> 


Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

2024-05-14 Thread Akhil P Oommen
On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> Memory barriers help ensure instruction ordering, NOT time and order
> of actual write arrival at other observers (e.g. memory-mapped IP).
> On architectures employing weak memory ordering, the latter can be a
> giant pain point, and it has been as part of this driver.
> 
> Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> readl/writel, which include r/w (respectively) barriers.
> 
> Replace the barriers with a readback that ensures the previous writes
> have exited the write buffer (as the CPU must flush the write to the
> register it's trying to read back) and subsequently remove the hack
> introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> status in hw_init").
> 
> Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 --
>  2 files changed, 6 insertions(+), 13 deletions(-)

I prefer this version compared to the v2. A helper routine is
unnecessary here because:
1. there are very few scenarios where we have to read back the same
register.
2. we may accidently readback a write only register.

> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 0e3dfd4c2bc8..4135a53b55a7 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
>   int ret;
>   u32 val;
>  
> - gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> - /* Wait for the register to finish posting */
> - wmb();
> + gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> + gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);

This is unnecessary because we are polling on a register on the same port 
below. But I think we
can replace "wmb()" above with "mb()" to avoid reordering between read
and write IO instructions.

>  
>   ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
>   val & (1 << 1), 100, 1);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 973872ad0474..0acbc38b8e70 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
>   }
>  
>   /* Clear GBIF halt in case GX domain was not collapsed */
> + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);

We need a full barrier here to avoid reordering. Also, lets add a
comment about why we are doing this odd looking sequence.

> + gpu_read(gpu, REG_A6XX_GBIF_HALT);
>   if (adreno_is_a619_holi(adreno_gpu)) {
> - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>   gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> - /* Let's make extra sure that the GPU can access the memory.. */
> - mb();

We need a full barrier here.

> + gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
>   } else if (a6xx_has_gbif(adreno_gpu)) {
> - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>   gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> - /* Let's make extra sure that the GPU can access the memory.. */
> - mb();

We need a full barrier here.

> + gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
>   }
>  
> - /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> - if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
> - spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
> -

Why is this removed?

-Akhil

>   gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
>  
>   if (adreno_is_a619_holi(adreno_gpu))
> 
> ---
> base-commit: 93a39e4766083050ca0ecd6a3548093a3b9eb60c
> change-id: 20240508-topic-adreno-a2d199cd4152
> 
> Best regards,
> -- 
> Konrad Dybcio 
> 


Re: [PATCH v4 04/16] drm/msm: move msm_gpummu.c to adreno/a2xx_gpummu.c

2024-03-25 Thread Akhil P Oommen
On Sun, Mar 24, 2024 at 01:13:55PM +0200, Dmitry Baryshkov wrote:
> On Sun, 24 Mar 2024 at 11:55, Akhil P Oommen  wrote:
> >
> > On Sat, Mar 23, 2024 at 12:56:56AM +0200, Dmitry Baryshkov wrote:
> > > The msm_gpummu.c implementation is used only on A2xx and it is tied to
> > > the A2xx registers. Rename the source file accordingly.
> > >
> >
> > There are very few functions in this file and a2xx_gpu.c is a relatively
> > small source file too. Shall we just move them to a2xx_gpu.c instead of
> > renaming?
> 
> I'd prefer to keep them separate, at least within this series. Let's
> leave that to Rob's discretion.

Sounds good.

Reviewed-by: Akhil P Oommen 

-Akhil

> 
> > -Akhil
> >
> > > Signed-off-by: Dmitry Baryshkov 
> > > ---
> > >  drivers/gpu/drm/msm/Makefile   |  2 +-
> > >  drivers/gpu/drm/msm/adreno/a2xx_gpu.c  |  4 +-
> > >  drivers/gpu/drm/msm/adreno/a2xx_gpu.h  |  4 ++
> > >  .../drm/msm/{msm_gpummu.c => adreno/a2xx_gpummu.c} | 45 
> > > --
> > >  drivers/gpu/drm/msm/msm_mmu.h  |  5 ---
> > >  5 files changed, 31 insertions(+), 29 deletions(-)
> 
> 
> -- 
> With best wishes
> Dmitry


Re: [PATCH v4 10/16] drm/msm: generate headers on the fly

2024-03-25 Thread Akhil P Oommen
On Sun, Mar 24, 2024 at 12:57:43PM +0200, Dmitry Baryshkov wrote:
> On Sun, 24 Mar 2024 at 12:30, Akhil P Oommen  wrote:
> >
> > On Sat, Mar 23, 2024 at 12:57:02AM +0200, Dmitry Baryshkov wrote:
> > > Generate DRM/MSM headers on the fly during kernel build. This removes a
> > > need to push register changes to Mesa with the following manual
> > > synchronization step. Existing headers will be removed in the following
> > > commits (split away to ease reviews).
> >
> > Is this approach common in upstream kernel? Isn't it a bit awkward from
> > legal perspective to rely on a source file outside of kernel during
> > compilation?
> 
> As long as the source file for that file is available. For examples of
> non-trivial generated files see
> arch/arm64/include/generated/sysreg-defs.h and
> arch/arm64/include/generated/cpucap-defs.h

I see that the xml files import a GPL compatible license, so I guess 
those are fine. The gen_header.py script doesn't include any license.
Shouldn't it have one?

-Akhil.

> 
> -- 
> With best wishes
> Dmitry


Re: [PATCH v4 10/16] drm/msm: generate headers on the fly

2024-03-24 Thread Akhil P Oommen
On Sat, Mar 23, 2024 at 12:57:02AM +0200, Dmitry Baryshkov wrote:
> Generate DRM/MSM headers on the fly during kernel build. This removes a
> need to push register changes to Mesa with the following manual
> synchronization step. Existing headers will be removed in the following
> commits (split away to ease reviews).

Is this approach common in upstream kernel? Isn't it a bit awkward from
legal perspective to rely on a source file outside of kernel during
compilation?

-Akhil

> 
> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/.gitignore |  1 +
>  drivers/gpu/drm/msm/Makefile   | 97 
> +-
>  drivers/gpu/drm/msm/msm_drv.c  |  3 +-
>  drivers/gpu/drm/msm/msm_gpu.c  |  2 +-
>  4 files changed, 80 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/.gitignore b/drivers/gpu/drm/msm/.gitignore
> new file mode 100644
> index ..9ab870da897d
> --- /dev/null
> +++ b/drivers/gpu/drm/msm/.gitignore
> @@ -0,0 +1 @@
> +generated/
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index 26ed4f443149..c861de58286c 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -1,10 +1,11 @@
>  # SPDX-License-Identifier: GPL-2.0
>  ccflags-y := -I $(srctree)/$(src)
> +ccflags-y += -I $(obj)/generated
>  ccflags-y += -I $(srctree)/$(src)/disp/dpu1
>  ccflags-$(CONFIG_DRM_MSM_DSI) += -I $(srctree)/$(src)/dsi
>  ccflags-$(CONFIG_DRM_MSM_DP) += -I $(srctree)/$(src)/dp
>  
> -msm-y := \
> +adreno-y := \
>   adreno/adreno_device.o \
>   adreno/adreno_gpu.o \
>   adreno/a2xx_gpu.o \
> @@ -18,7 +19,11 @@ msm-y := \
>   adreno/a6xx_gmu.o \
>   adreno/a6xx_hfi.o \
>  
> -msm-$(CONFIG_DRM_MSM_HDMI) += \
> +adreno-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
> +
> +adreno-$(CONFIG_DRM_MSM_GPU_STATE)   += adreno/a6xx_gpu_state.o
> +
> +msm-display-$(CONFIG_DRM_MSM_HDMI) += \
>   hdmi/hdmi.o \
>   hdmi/hdmi_audio.o \
>   hdmi/hdmi_bridge.o \
> @@ -31,7 +36,7 @@ msm-$(CONFIG_DRM_MSM_HDMI) += \
>   hdmi/hdmi_phy_8x74.o \
>   hdmi/hdmi_pll_8960.o \
>  
> -msm-$(CONFIG_DRM_MSM_MDP4) += \
> +msm-display-$(CONFIG_DRM_MSM_MDP4) += \
>   disp/mdp4/mdp4_crtc.o \
>   disp/mdp4/mdp4_dsi_encoder.o \
>   disp/mdp4/mdp4_dtv_encoder.o \
> @@ -42,7 +47,7 @@ msm-$(CONFIG_DRM_MSM_MDP4) += \
>   disp/mdp4/mdp4_kms.o \
>   disp/mdp4/mdp4_plane.o \
>  
> -msm-$(CONFIG_DRM_MSM_MDP5) += \
> +msm-display-$(CONFIG_DRM_MSM_MDP5) += \
>   disp/mdp5/mdp5_cfg.o \
>   disp/mdp5/mdp5_cmd_encoder.o \
>   disp/mdp5/mdp5_ctl.o \
> @@ -55,7 +60,7 @@ msm-$(CONFIG_DRM_MSM_MDP5) += \
>   disp/mdp5/mdp5_plane.o \
>   disp/mdp5/mdp5_smp.o \
>  
> -msm-$(CONFIG_DRM_MSM_DPU) += \
> +msm-display-$(CONFIG_DRM_MSM_DPU) += \
>   disp/dpu1/dpu_core_perf.o \
>   disp/dpu1/dpu_crtc.o \
>   disp/dpu1/dpu_encoder.o \
> @@ -85,14 +90,16 @@ msm-$(CONFIG_DRM_MSM_DPU) += \
>   disp/dpu1/dpu_vbif.o \
>   disp/dpu1/dpu_writeback.o
>  
> -msm-$(CONFIG_DRM_MSM_MDSS) += \
> +msm-display-$(CONFIG_DRM_MSM_MDSS) += \
>   msm_mdss.o \
>  
> -msm-y += \
> +msm-display-y += \
>   disp/mdp_format.o \
>   disp/mdp_kms.o \
>   disp/msm_disp_snapshot.o \
>   disp/msm_disp_snapshot_util.o \
> +
> +msm-y += \
>   msm_atomic.o \
>   msm_atomic_tracepoints.o \
>   msm_debugfs.o \
> @@ -115,12 +122,12 @@ msm-y += \
>   msm_submitqueue.o \
>   msm_gpu_tracepoints.o \
>  
> -msm-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
> - dp/dp_debug.o
> +msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o
>  
> -msm-$(CONFIG_DRM_MSM_GPU_STATE)  += adreno/a6xx_gpu_state.o
> +msm-display-$(CONFIG_DEBUG_FS) += \
> + dp/dp_debug.o
>  
> -msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
> +msm-display-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
>   dp/dp_catalog.o \
>   dp/dp_ctrl.o \
>   dp/dp_display.o \
> @@ -130,21 +137,69 @@ msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
>   dp/dp_audio.o \
>   dp/dp_utils.o
>  
> -msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o
> -
> -msm-$(CONFIG_DRM_MSM_HDMI_HDCP) += hdmi/hdmi_hdcp.o
> +msm-display-$(CONFIG_DRM_MSM_HDMI_HDCP) += hdmi/hdmi_hdcp.o
>  
> -msm-$(CONFIG_DRM_MSM_DSI) += dsi/dsi.o \
> +msm-display-$(CONFIG_DRM_MSM_DSI) += dsi/dsi.o \
>   dsi/dsi_cfg.o \
>   dsi/dsi_host.o \
>   dsi/dsi_manager.o \
>   dsi/phy/dsi_phy.o
>  
> -msm-$(CONFIG_DRM_MSM_DSI_28NM_PHY) += dsi/phy/dsi_phy_28nm.o
> -msm-$(CONFIG_DRM_MSM_DSI_20NM_PHY) += dsi/phy/dsi_phy_20nm.o
> -msm-$(CONFIG_DRM_MSM_DSI_28NM_8960_PHY) += dsi/phy/dsi_phy_28nm_8960.o
> -msm-$(CONFIG_DRM_MSM_DSI_14NM_PHY) += dsi/phy/dsi_phy_14nm.o
> -msm-$(CONFIG_DRM_MSM_DSI_10NM_PHY) += dsi/phy/dsi_phy_10nm.o
> -msm-$(CONFIG_DRM_MSM_DSI_7NM_PHY) += dsi/phy/dsi_phy_7nm.o
> +msm-display-$(CONFIG_DRM_MSM_DSI_28NM_PHY) += dsi/phy/dsi_phy_28nm.o
> +msm-display-$(CONFIG_DRM_MSM_DSI_20NM

Re: [PATCH v4 04/16] drm/msm: move msm_gpummu.c to adreno/a2xx_gpummu.c

2024-03-24 Thread Akhil P Oommen
On Sat, Mar 23, 2024 at 12:56:56AM +0200, Dmitry Baryshkov wrote:
> The msm_gpummu.c implementation is used only on A2xx and it is tied to
> the A2xx registers. Rename the source file accordingly.
> 

There are very few functions in this file and a2xx_gpu.c is a relatively
small source file too. Shall we just move them to a2xx_gpu.c instead of
renaming?

-Akhil

> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/Makefile   |  2 +-
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c  |  4 +-
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.h  |  4 ++
>  .../drm/msm/{msm_gpummu.c => adreno/a2xx_gpummu.c} | 45 
> --
>  drivers/gpu/drm/msm/msm_mmu.h  |  5 ---
>  5 files changed, 31 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index b21ae2880c71..26ed4f443149 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -8,6 +8,7 @@ msm-y := \
>   adreno/adreno_device.o \
>   adreno/adreno_gpu.o \
>   adreno/a2xx_gpu.o \
> + adreno/a2xx_gpummu.o \
>   adreno/a3xx_gpu.o \
>   adreno/a4xx_gpu.o \
>   adreno/a5xx_gpu.o \
> @@ -113,7 +114,6 @@ msm-y += \
>   msm_ringbuffer.o \
>   msm_submitqueue.o \
>   msm_gpu_tracepoints.o \
> - msm_gpummu.o
>  
>  msm-$(CONFIG_DEBUG_FS) += adreno/a5xx_debugfs.o \
>   dp/dp_debug.o
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 0d8133f3174b..0dc255ddf5ce 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -113,7 +113,7 @@ static int a2xx_hw_init(struct msm_gpu *gpu)
>   uint32_t *ptr, len;
>   int i, ret;
>  
> - msm_gpummu_params(gpu->aspace->mmu, &pt_base, &tran_error);
> + a2xx_gpummu_params(gpu->aspace->mmu, &pt_base, &tran_error);
>  
>   DBG("%s", gpu->name);
>  
> @@ -469,7 +469,7 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
> msm_gpu *gpu)
>  static struct msm_gem_address_space *
>  a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
>  {
> - struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
> + struct msm_mmu *mmu = a2xx_gpummu_new(&pdev->dev, gpu);
>   struct msm_gem_address_space *aspace;
>  
>   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.h
> index 161a075f94af..53702f19990f 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.h
> @@ -19,4 +19,8 @@ struct a2xx_gpu {
>  };
>  #define to_a2xx_gpu(x) container_of(x, struct a2xx_gpu, base)
>  
> +struct msm_mmu *a2xx_gpummu_new(struct device *dev, struct msm_gpu *gpu);
> +void a2xx_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
> + dma_addr_t *tran_error);
> +
>  #endif /* __A2XX_GPU_H__ */
> diff --git a/drivers/gpu/drm/msm/msm_gpummu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpummu.c
> similarity index 67%
> rename from drivers/gpu/drm/msm/msm_gpummu.c
> rename to drivers/gpu/drm/msm/adreno/a2xx_gpummu.c
> index f7d1945e0c9f..39641551eeb6 100644
> --- a/drivers/gpu/drm/msm/msm_gpummu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpummu.c
> @@ -5,30 +5,33 @@
>  
>  #include "msm_drv.h"
>  #include "msm_mmu.h"
> -#include "adreno/adreno_gpu.h"
> -#include "adreno/a2xx.xml.h"
>  
> -struct msm_gpummu {
> +#include "adreno_gpu.h"
> +#include "a2xx_gpu.h"
> +
> +#include "a2xx.xml.h"
> +
> +struct a2xx_gpummu {
>   struct msm_mmu base;
>   struct msm_gpu *gpu;
>   dma_addr_t pt_base;
>   uint32_t *table;
>  };
> -#define to_msm_gpummu(x) container_of(x, struct msm_gpummu, base)
> +#define to_a2xx_gpummu(x) container_of(x, struct a2xx_gpummu, base)
>  
>  #define GPUMMU_VA_START SZ_16M
>  #define GPUMMU_VA_RANGE (0xfff * SZ_64K)
>  #define GPUMMU_PAGE_SIZE SZ_4K
>  #define TABLE_SIZE (sizeof(uint32_t) * GPUMMU_VA_RANGE / GPUMMU_PAGE_SIZE)
>  
> -static void msm_gpummu_detach(struct msm_mmu *mmu)
> +static void a2xx_gpummu_detach(struct msm_mmu *mmu)
>  {
>  }
>  
> -static int msm_gpummu_map(struct msm_mmu *mmu, uint64_t iova,
> +static int a2xx_gpummu_map(struct msm_mmu *mmu, uint64_t iova,
>   struct sg_table *sgt, size_t len, int prot)
>  {
> - struct msm_gpummu *gpummu = to_msm_gpummu(mmu);
> + struct a2xx_gpummu *gpummu = to_a2xx_gpummu(mmu);
>   unsigned idx = (iova - GPUMMU_VA_START) / GPUMMU_PAGE_SIZE;
>   struct sg_dma_page_iter dma_iter;
>   unsigned prot_bits = 0;
> @@ -53,9 +56,9 @@ static int msm_gpummu_map(struct msm_mmu *mmu, uint64_t 
> iova,
>   return 0;
>  }
>  
> -static int msm_gpummu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len)
> +static int a2xx_gpummu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len)
>  {
> - struct msm_gpummu *gpummu = to_msm_gpummu(mmu);
> + struct a2xx

Re: [PATCH] drm/msm/a6xx: Fix recovery vs runpm race

2023-12-22 Thread Akhil P Oommen
On Mon, Dec 18, 2023 at 07:59:24AM -0800, Rob Clark wrote:
> 
> From: Rob Clark 
> 
> a6xx_recover() is relying on the gpu lock to serialize against incoming
> submits doing a runpm get, as it tries to temporarily balance out the
> runpm gets with puts in order to power off the GPU.  Unfortunately this
> gets worse when we (in a later patch) will move the runpm get out of the
> scheduler thread/work to move it out of the fence signaling path.
> 
> Instead we can just simplify the whole thing by using force_suspend() /
> force_resume() instead of trying to be clever.

At some places, we take a pm_runtime vote and access the gpu
registers assuming it will be powered until we drop the vote.  
a6xx_get_timestamp()
is an example. If we do a force suspend, it may cause bus errors from
those threads. Now you have to serialize every place we do runtime_get/put with 
a
mutex. Or is there a better way to handle the 'later patch' you
mentioned?

-Akhil.

> 
> Reported-by: David Heidelberg 
> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10272
> Fixes: abe2023b4cea ("drm/msm/gpu: Push gpu lock down past runpm")
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 268737e59131..a5660d63535b 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1244,12 +1244,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
>   dev_pm_genpd_add_notifier(gmu->cxpd, &gmu->pd_nb);
>   dev_pm_genpd_synced_poweroff(gmu->cxpd);
>  
> - /* Drop the rpm refcount from active submits */
> - if (active_submits)
> - pm_runtime_put(&gpu->pdev->dev);
> -
> - /* And the final one from recover worker */
> - pm_runtime_put_sync(&gpu->pdev->dev);
> + pm_runtime_force_suspend(&gpu->pdev->dev);
>  
>   if (!wait_for_completion_timeout(&gmu->pd_gate, msecs_to_jiffies(1000)))
>   DRM_DEV_ERROR(&gpu->pdev->dev, "cx gdsc didn't collapse\n");
> @@ -1258,10 +1253,7 @@ static void a6xx_recover(struct msm_gpu *gpu)
>  
>   pm_runtime_use_autosuspend(&gpu->pdev->dev);
>  
> - if (active_submits)
> - pm_runtime_get(&gpu->pdev->dev);
> -
> - pm_runtime_get_sync(&gpu->pdev->dev);
> + pm_runtime_force_resume(&gpu->pdev->dev);
>  
>   gpu->active_submits = active_submits;
>   mutex_unlock(&gpu->active_lock);
> -- 
> 2.43.0
> 


Re: [PATCH v2 1/1] drm/msm/adreno: Add support for SM7150 SoC machine

2023-12-07 Thread Akhil P Oommen
On Thu, Nov 23, 2023 at 12:03:56AM +0300, Danila Tikhonov wrote:
> 
> sc7180/sm7125 (atoll) expects speedbins from atoll.dtsi:
> And has a parameter: /delete-property/ qcom,gpu-speed-bin;
> 107 for 504Mhz max freq, pwrlevel 4
> 130 for 610Mhz max freq, pwrlevel 3
> 159 for 750Mhz max freq, pwrlevel 5
> 169 for 800Mhz max freq, pwrlevel 2
> 174 for 825Mhz max freq, pwrlevel 1 (Downstream says 172, but thats probably
> typo)
A bit confused. where do you see 172 in downstream code? It is 174 in the 
downstream
code when I checked.
> For rest of the speed bins, speed-bin value is calulated as
> FMAX/4.8MHz + 2 round up to zero decimal places.
> 
> sm7150 (sdmmagpie) expects speedbins from sdmmagpie-gpu.dtsi:
> 128 for 610Mhz max freq, pwrlevel 3
> 146 for 700Mhz max freq, pwrlevel 2
> 167 for 800Mhz max freq, pwrlevel 4
> 172 for 504Mhz max freq, pwrlevel 1
> For rest of the speed bins, speed-bin value is calulated as
> FMAX/4.8 MHz round up to zero decimal places.
> 
> Creating a new entry does not make much sense.
> I can suggest expanding the standard entry:
> 
> .speedbins = ADRENO_SPEEDBINS(
>     { 0, 0 },
>     /* sc7180/sm7125 */
>     { 107, 3 },
>     { 130, 4 },
>     { 159, 5 },
>     { 168, 1 }, has already
>     { 174, 2 }, has already
>     /* sm7150 */
>     { 128, 1 },
>     { 146, 2 },
>     { 167, 3 },
>     { 172, 4 }, ),
> 

A difference I see between atoll and sdmmagpie is that the former
doesn't support 180Mhz. If you want to do the same, then you need to use
a new bit in the supported-hw bitfield instead of reusing an existing one.
Generally it is better to stick to exactly what downstream does.

-Akhil.

> All the best,
> Danila
> 
> On 11/22/23 23:28, Konrad Dybcio wrote:
> > 
> > 
> > On 10/16/23 16:32, Dmitry Baryshkov wrote:
> > > On 26/09/2023 23:03, Konrad Dybcio wrote:
> > > > On 26.09.2023 21:10, Danila Tikhonov wrote:
> > > > > 
> > > > > I think you mean by name downstream dt - sdmmagpie-gpu.dtsi
> > > > > 
> > > > > You can see the forked version of the mainline here:
> > > > > https://github.com/sm7150-mainline/linux/blob/next/arch/arm64/boot/dts/qcom/sm7150.dtsi
> > > > > 
> > > > > 
> > > > > All fdt that we got here, if it is useful for you:
> > > > > https://github.com/sm7150-mainline/downstream-fdt
> > > > > 
> > > > > Best wishes, Danila
> > > > Taking a look at downstream, atoll.dtsi (SC7180) includes
> > > > sdmmagpie-gpu.dtsi.
> > > > 
> > > > Bottom line is, they share the speed bins, so it should be
> > > > fine to just extend the existing entry.
> > > 
> > > But then atoll.dtsi rewrites speed bins and pwrlevel bins. So they
> > > are not shared.
> > +Akhil
> > 
> > could you please check internally?
> > 
> > Konrad
> 


Re: [Freedreno] [PATCH 1/7] drm/msm/a6xx: Fix unknown speedbin case

2023-10-17 Thread Akhil P Oommen
On Tue, Oct 17, 2023 at 01:22:27AM +0530, Akhil P Oommen wrote:
> 
> On Tue, Sep 26, 2023 at 08:24:36PM +0200, Konrad Dybcio wrote:
> > 
> > When opp-supported-hw is present under an OPP node, but no form of
> > opp_set_supported_hw() has been called, that OPP is ignored by the API
> > and marked as unsupported.
> > 
> > Before Commit c928a05e4415 ("drm/msm/adreno: Move speedbin mapping to
> > device table"), an unknown speedbin would result in marking all OPPs
> > as available, but it's better to avoid potentially overclocking the
> > silicon - the GMU will simply refuse to power up the chip.
> > 
> > Currently, the Adreno speedbin code does just that (AND returns an
> > invalid error, (int)UINT_MAX). Fix that by defaulting to speedbin 0
> > (which is conveniently always bound to fuseval == 0).
> 
> Wish we documented somewhere that we should reserve BIT(0) for fuse
> val=0 always and assume that would be the super SKU.
Aah! I got this backward. Fuseval=0 is the supersku and it is not safe
to fallback to that blindly. Ideally, we should fallback to the lowest
denominator SKU, but it is difficult to predict that upfront and assign
BIT(0).

Anyway, I can't see a better way to handle this.

-Akhil

> 
> Reviewed-by: Akhil P Oommen 
> 
> -Akhil
> 
> > 
> > Fixes: c928a05e4415 ("drm/msm/adreno: Move speedbin mapping to device 
> > table")
> > Signed-off-by: Konrad Dybcio 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index d4e85e24002f..522ca7fe6762 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -2237,7 +2237,7 @@ static int a6xx_set_supported_hw(struct device *dev, 
> > const struct adreno_info *i
> > DRM_DEV_ERROR(dev,
> > "missing support for speed-bin: %u. Some OPPs may not 
> > be supported by hardware\n",
> > speedbin);
> > -   return UINT_MAX;
> > +   supp_hw = BIT(0); /* Default */
> > }
> >  
> > ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
> > 
> > -- 
> > 2.42.0
> > 


  1   2   3   4   5   6   >