from:"alyssa"

Re: [PATCH v3] drm/atomic-helpers: Invoke end_fb_access while owning plane state

2023-12-03 Thread Alyssa Ross

Thomas Zimmermann  writes:

> Invoke drm_plane_helper_funcs.end_fb_access before
> drm_atomic_helper_commit_hw_done(). The latter function hands over
> ownership of the plane state to the following commit, which might
> free it. Releasing resources in end_fb_access then operates on undefined
> state. This bug has been observed with non-blocking commits when they
> are being queued up quickly.
>
> Here is an example stack trace from the bug report. The plane state has
> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>
> Unable to handle kernel paging request at virtual address 00010049
> [...]
>  drm_gem_fb_vunmap+0x18/0x74
>  drm_gem_end_shadow_fb_access+0x1c/0x2c
>  drm_atomic_helper_cleanup_planes+0x58/0xd8
>  drm_atomic_helper_commit_tail+0x90/0xa0
>  commit_tail+0x15c/0x188
>  commit_work+0x14/0x20
>
> Fix this by running end_fb_access immediately after updating all planes
> in drm_atomic_helper_commit_planes(). The existing clean-up helper
> drm_atomic_helper_cleanup_planes() now only handles cleanup_fb.
>
> For aborted commits, roll back from drm_atomic_helper_prepare_planes()
> in the new helper drm_atomic_helper_unprepare_planes(). This case is
> different from regular cleanup, as we have to release the new state;
> regular cleanup releases the old state. The new helper also invokes
> cleanup_fb for all planes.
>
> The changes mostly involve DRM's atomic helpers. Only two drivers, i915
> and nouveau, implement their own commit function. Update them to invoke
> drm_atomic_helper_unprepare_planes(). Drivers with custom commit_tail
> function do not require changes.
>
> v3:
>   * add drm_atomic_helper_unprepare_planes() for rolling back
>   * use correct state for end_fb_access
> v2:
>   * fix test in drm_atomic_helper_cleanup_planes()
>
> Reported-by: Alyssa Ross 
> Closes: https://lore.kernel.org/dri-devel/87leazm0ya@alyssa.is/
> Suggested-by: Daniel Vetter 
> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane 
> helpers")
> Signed-off-by: Thomas Zimmermann 
> Cc:  # v6.2+

I've been running this for days now, and haven't had a single Oops.
Given the rate with which I encountered them before in this
configuration, it looks very likely that the issue is resolved.

Tested-by: Alyssa Ross 

And, once the wrong parameter name in the kerneldoc identified by the
kernel test robot is resolved,

Reviewed-by: Alyssa Ross 


signature.asc
Description: PGP signature

Re: [PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state

2023-11-29 Thread Alyssa Ross

Thomas Zimmermann  writes:

> Hi
>
> Am 27.11.23 um 17:25 schrieb Alyssa Ross:
>> Thomas Zimmermann  writes:
>> 
>>> Invoke drm_plane_helper_funcs.end_fb_access before
>>> drm_atomic_helper_commit_hw_done(). The latter function hands over
>>> ownership of the plane state to the following commit, which might
>>> free it. Releasing resources in end_fb_access then operates on undefined
>>> state. This bug has been observed with non-blocking commits when they
>>> are being queued up quickly.
>>>
>>> Here is an example stack trace from the bug report. The plane state has
>>> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>>>
>>> Unable to handle kernel paging request at virtual address 00010049
>>> [...]
>>>   drm_gem_fb_vunmap+0x18/0x74
>>>   drm_gem_end_shadow_fb_access+0x1c/0x2c
>>>   drm_atomic_helper_cleanup_planes+0x58/0xd8
>>>   drm_atomic_helper_commit_tail+0x90/0xa0
>>>   commit_tail+0x15c/0x188
>>>   commit_work+0x14/0x20
>>>
>>> For aborted commits, it is still ok to run end_fb_access as part of the
>>> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>>>
>>> v2:
>>> * fix test in drm_atomic_helper_cleanup_planes()
>>>
>>> Reported-by: Alyssa Ross 
>>> Closes: https://lore.kernel.org/dri-devel/87leazm0ya@alyssa.is/
>>> Suggested-by: Daniel Vetter 
>>> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane 
>>> helpers")
>>> Signed-off-by: Thomas Zimmermann 
>>> Cc:  # v6.2+
>>> ---
>>>   drivers/gpu/drm/drm_atomic_helper.c | 17 +
>>>   1 file changed, 17 insertions(+)
>> 
>> Got this basically immediately. :(
>
> I've never seen such problems on other systems. Is there anything 
> different about the Mac systems? How do you trigger these errors?

My understanding is that all sorts of things are different, but I don't
know too much about the details.  There's of course a chance that there
could be some other change in the Asahi Linux kernel that causes this
problem to surface — as I said, I reviewed the diff with mainline and
didn't see anything that looked relevant, but I could well have missed
something.  I don't think I can test mainline directly, as it doesn't
yet support enough of the hardware — for slightly older Apple Silicon
Mac models, I think enough is upstream that this would be possible, but
I don't have access to any.

I started off encountering these errors every few days.  I noticed them
because they would sometimes result in my system either starting to
freeze for 10 seconds at a time, or until I switched VT.  They seem to
correlate with the system being under high CPU load.  I was also able to
substantially increase the frequency with which they occurred by adding
logging to the kernel — even just drm.debug=0x10 makes a big difference,
and when I also added a few dump_backtrace() calls when I was trying to
understand the code and diagnose the problem, I would relatively
consistently encounter an Oops within a few minutes of load.

BTW: v3 is looking good so far.  I've only been testing it since this
morning, though, so I'll keep trying it out for a bit longer before I
declare the problem to have been solved and send a Tested-by.

signature.asc
Description: PGP signature

Re: [PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state

2023-11-27 Thread Alyssa Ross

Thomas Zimmermann  writes:

> Invoke drm_plane_helper_funcs.end_fb_access before
> drm_atomic_helper_commit_hw_done(). The latter function hands over
> ownership of the plane state to the following commit, which might
> free it. Releasing resources in end_fb_access then operates on undefined
> state. This bug has been observed with non-blocking commits when they
> are being queued up quickly.
>
> Here is an example stack trace from the bug report. The plane state has
> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>
> Unable to handle kernel paging request at virtual address 00010049
> [...]
>  drm_gem_fb_vunmap+0x18/0x74
>  drm_gem_end_shadow_fb_access+0x1c/0x2c
>  drm_atomic_helper_cleanup_planes+0x58/0xd8
>  drm_atomic_helper_commit_tail+0x90/0xa0
>  commit_tail+0x15c/0x188
>  commit_work+0x14/0x20
>
> For aborted commits, it is still ok to run end_fb_access as part of the
> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>
> v2:
>   * fix test in drm_atomic_helper_cleanup_planes()
>
> Reported-by: Alyssa Ross 
> Closes: https://lore.kernel.org/dri-devel/87leazm0ya@alyssa.is/
> Suggested-by: Daniel Vetter 
> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane 
> helpers")
> Signed-off-by: Thomas Zimmermann 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 17 +
>  1 file changed, 17 insertions(+)

Got this basically immediately. :(

simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated 
atomic state cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_plane_state] 
Added [PLANE:31:plane-0] 4935bdca state to cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:33:crtc-0] d25f613d state to cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_set_fb_for_plane] Set 
[FB:38] for [PLANE:31:plane-0] state 4935bdca
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_connector_state] 
Added [CONNECTOR:35:Unknown-1] 20d19f10 state to cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_check_only] checking 
cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
Updating routing for [CONNECTOR:35:Unknown-1]
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
[CONNECTOR:35:Unknown-1] keeps [ENCODER:34:None-34], now on [CRTC:33:crtc-0]
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_nonblocking_commit] 
committing cfb3f1f2 nonblocking
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_default_clear] 
Clearing atomic state cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:__drm_atomic_state_free] Freeing 
atomic state cfb3f1f2
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated 
atomic state 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_plane_state] 
Added [PLANE:31:plane-0] 83f22dc6 state to 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:33:crtc-0] eec339c5 state to 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_set_fb_for_plane] Set 
[FB:37] for [PLANE:31:plane-0] state 83f22dc6
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_connector_state] 
Added [CONNECTOR:35:Unknown-1] 22495ce9 state to 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_check_only] checking 
03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
Updating routing for [CONNECTOR:35:Unknown-1]
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
[CONNECTOR:35:Unknown-1] keeps [ENCODER:34:None-34], now on [CRTC:33:crtc-0]
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_default_clear] 
Clearing atomic state 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:__drm_atomic_state_free] Freeing 
atomic state 03dc0c0b
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated 
atomic state 03dc0c0b
simple-framebuffer dd53a4

Re: [PATCH] drm/atomic-helpers: Invoke end_fb_access while owning plane state

2023-11-26 Thread Alyssa Ross

Thomas Zimmermann  writes:

> Invoke drm_plane_helper_funcs.end_fb_access before
> drm_atomic_helper_commit_hw_done(). The latter function hands over
> ownership of the plane state to the following commit, which might
> free it. Releasing resources in end_fb_access then operates on undefined
> state. This bug has been observed with non-blocking commits when they
> are being queued up quickly.
>
> Here is an example stack trace from the bug report. The plane state has
> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>
> Unable to handle kernel paging request at virtual address 00010049
> [...]
>  drm_gem_fb_vunmap+0x18/0x74
>  drm_gem_end_shadow_fb_access+0x1c/0x2c
>  drm_atomic_helper_cleanup_planes+0x58/0xd8
>  drm_atomic_helper_commit_tail+0x90/0xa0
>  commit_tail+0x15c/0x188
>  commit_work+0x14/0x20
>
> For aborted commits, it is still ok to run end_fb_access as part of the
> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>
> Reported-by: Alyssa Ross 
> Closes: https://lore.kernel.org/dri-devel/87leazm0ya@alyssa.is/
> Suggested-by: Daniel Vetter 
> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane 
> helpers")
> Signed-off-by: Thomas Zimmermann 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 17 +
>  1 file changed, 17 insertions(+)

I've been trying this patch for the last couple of days.  Alas the
problem doesn't seem to have been resolved entirely, because I've had
the following Oopses:


simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated 
atomic state af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_plane_state] 
Added [PLANE:31:plane-0] 01cc7517 state to af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:33:crtc-0] e546877a state to af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_set_fb_for_plane] Set 
[FB:37] for [PLANE:31:plane-0] state 01cc7517
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_connector_state] 
Added [CONNECTOR:35:Unknown-1] 8cee195b state to af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_check_only] checking 
af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
Updating routing for [CONNECTOR:35:Unknown-1]
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
[CONNECTOR:35:Unknown-1] keeps [ENCODER:34:None-34], now on [CRTC:33:crtc-0]
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_nonblocking_commit] 
committing af08a086 nonblocking
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_default_clear] 
Clearing atomic state af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:__drm_atomic_state_free] Freeing 
atomic state af08a086
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated 
atomic state f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_plane_state] 
Added [PLANE:31:plane-0] d3b51954 state to f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:33:crtc-0] e7c9e6b8 state to f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_set_fb_for_plane] Set 
[FB:38] for [PLANE:31:plane-0] state d3b51954
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_get_connector_state] 
Added [CONNECTOR:35:Unknown-1] 016b7c7e state to f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_check_only] checking 
f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
Updating routing for [CONNECTOR:35:Unknown-1]
simple-framebuffer dd53a4000.framebuffer: [drm:update_connector_routing] 
[CONNECTOR:35:Unknown-1] keeps [ENCODER:34:None-34], now on [CRTC:33:crtc-0]
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_add_encoder_bridges] 
Adding all bridges for [encoder:34:None-34] to f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_default_clear] 
Clearing atomic state f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:__drm_atomic_state_free] Freeing 
atomic state f87a08e9
simple-framebuffer dd53a4000.framebuffer: [drm:drm_atomic_state_init] Allocated

Use after free with GEM shadow-buffered planes

2023-11-16 Thread Alyssa Ross

[Originally reported at https://gitlab.freedesktop.org/drm/misc/-/issues/33]

The following happens in a cycle:

 • An atomic state is allocated
 • A plane state is allocated (drm_gem_duplicate_shadow_plane_state())
 • Commit (drm_atomic_helper_commit(), possibly nonblocking / asynchronously)
 • The previous plane state is freed (drm_gem_destroy_shadow_plane_state())
 • The atomic state is put

But what happens if a nonblocking commit doesn't get scheduled until a
couple of iterations later in the cycle?  Plane states are not
refcounted, so by that point, the plane state has been freed, and so
commit_tail() will encounter a use after free when it accesses the plane
state.

I encountered this issue using simpledrm on the Asahi kernel based on
v6.5, but none of the files I examined to determine that this is a
use-after-free have been modified from mainline.  I've also reviewed the
diff between my kernel and tip of mainline (8f6f76a6a29f), and didn't
see anything that would affect this issue.

Here's an example of a use after free.  It's been a couple of weeks
since I thoroughly investigated this, but from memory, in this case, the
plane state has been overwritten by a struct drm_crtc_state.

Unable to handle kernel paging request at virtual address 00010049
Mem abort info:
  ESR = 0x9605
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x0005, ISS2 = 0x
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 16k pages, 48-bit VAs, pgdp=00080e0e31b0
[00010049] pgd=08083d390003, p4d=08083d390003, 
pud=08083db9c003, pmd=
Internal error: Oops: 9605 [#1] PREEMPT SMP
Modules linked in: overlay uas usb_storage usbhid rfcomm snd_seq_dummy 
snd_hrtimer snd_seq snd_seq_device bnep des_generic libdes md4 brcmfmac_wcc 
joydev hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 hid_magicmouse 
ecdh_generic ecc rfkill snd_soc_macaudio macsmc_hid macsmc_power macsmc_reboot 
ofpart spi_nor apple_isp videobuf2_dma_sg snd_soc_cs42l84 snd_soc_tas2764 
videobuf2_memops clk_apple_nco snd_soc_apple_mca apple_admac videobuf2_v4l2 
videodev videobuf2_common mc hid_apple pwm_apple leds_pwm apple_soc_cpufreq 
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_rpfilter 
ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables nfnetlink 
loop tun tap macvlan bridge stp llc fuse zstd zram dm_crypt xhci_plat_hcd 
xhci_hcd nvmem_spmi_mfd rtc_macsmc gpio_macsmc tps6598x dockchannel_hid 
simple_mfd_spmi regmap_spmi nvme_apple phy_apple_atc dwc3 pcie_apple typec 
pci_host_common udc_core apple_sart macsmc_rtkit apple_rtkit_helper 
apple_dockchannel macsmc apple_rtkit mfd_core
 spmi_apple_controller nvmem_apple_efuses pinctrl_apple_gpio spi_apple 
i2c_apple apple_dart apple_mailbox btrfs xor xor_neon raid6_pq
CPU: 0 PID: 1095074 Comm: kworker/u16:11 Tainted: G S 
6.5.0-asahi #1-NixOS
Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
Workqueue: events_unbound commit_work
pstate: 2149 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : drm_gem_fb_vunmap+0x18/0x74
lr : drm_gem_end_shadow_fb_access+0x1c/0x2c
sp : 800087ea3d00
x29: 800087ea3d00 x28:  x27: 
x26: 800081325000 x25: fef7 x24: 46c5b560
x23: 01fcaa05 x22:  x21: 00010001
x20: 46c5b500 x19: 0001 x18: 
x17:  x16:  x15: 2e2d5ab0
x14: 0195 x13:  x12: 800081310a80
x11: 0001 x10: 1444e7e23f083897 x9 : 6e82f0b7605f292f
x8 : 0001249e0f48 x7 : 0004 x6 : 0190
x5 : 0001 x4 : 93c54440 x3 : 0e968000
x2 : 80008077883c x1 : 9ce37498 x0 : 00010001
Call trace:
 drm_gem_fb_vunmap+0x18/0x74
 drm_gem_end_shadow_fb_access+0x1c/0x2c
 drm_atomic_helper_cleanup_planes+0x58/0xd8
 drm_atomic_helper_commit_tail+0x90/0xa0
 commit_tail+0x15c/0x188
 commit_work+0x14/0x20
 process_one_work+0x1e0/0x344
 worker_thread+0x68/0x424
 kthread+0xf4/0x100
 ret_from_fork+0x10/0x20
Code: 910003fd a90153f3 f90013f5 aa0003f5 (f9402400) 
---[ end trace  ]---


signature.asc
Description: PGP signature

Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

2023-07-15 Thread alyssa

15 July 2023 at 00:03, "Luben Tuikov"  wrote:


> 
> On 2023-07-14 05:57, Christian König wrote:
> 
> > 
> > Am 14.07.23 um 11:49 schrieb Asahi Lina:
> > 
> > > 
> > > On 14/07/2023 17.43, Christian König wrote:
> > > 
> > 
> >  Am 14.07.23 um 10:21 schrieb Asahi Lina:
> >  A signaled scheduler fence can outlive its scheduler, since fences are
> >  independencly reference counted. Therefore, we can't reference the
> >  scheduler in the get_timeline_name() implementation.
> > 
> >  Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
> >  dma-bufs reference fences from GPU schedulers that no longer exist.
> > 
> >  Signed-off-by: Asahi Lina 
> >  ---
> >     drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
> >     drivers/gpu/drm/scheduler/sched_fence.c  | 4 +++-
> >     include/drm/gpu_scheduler.h  | 5 +
> >     3 files changed, 14 insertions(+), 2 deletions(-)
> > 
> >  diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> >  b/drivers/gpu/drm/scheduler/sched_entity.c
> >  index b2bbc8a68b30..17f35b0b005a 100644
> >  --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >  +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >  @@ -389,7 +389,12 @@ static bool 
> >  drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> >        /*
> >      * Fence is from the same scheduler, only need to wait for
> >  - * it to be scheduled
> >  + * it to be scheduled.
> >  + *
> >  + * Note: s_fence->sched could have been freed and reallocated
> >  + * as another scheduler. This false positive case is okay, 
> >  as if
> >  + * the old scheduler was freed all of its jobs must have
> >  + * signaled their completion fences.
> > 
> >  This is outright nonsense. As long as an entity for a scheduler exists
> >  it is not allowed to free up this scheduler.
> > 
> >  So this function can't be called like this.
> > 
> > > 
> > > As I already explained, the fences can outlive their scheduler. That 
> > >  means *this* entity certainly exists for *this* scheduler, but the 
> > >  *dependency* fence might have come from a past scheduler that was 
> > >  already destroyed along with all of its entities, and its address reused.
> > > 
> > 
> >  
> >  Well this is function is not about fences, this function is a callback 
> >  for the entity.
> >  
> > 
> > > 
> > > Christian, I'm really getting tired of your tone. I don't appreciate 
> > >  being told my comments are "outright nonsense" when you don't even 
> > >  take the time to understand what the issue is and what I'm trying to 
> > >  do/document. If you aren't interested in working with me, I'm just 
> > >  going to give up on drm_sched, wait until Rust gets workqueue support, 
> > >  and reimplement it in Rust. You can keep your broken fence lifetime 
> > >  semantics and I'll do my own thing.
> > > 
> > 
> >  
> >  I'm certainly trying to help here, but you seem to have unrealistic 
> >  expectations.
> >  
> >  I perfectly understand what you are trying to do, but you don't seem to 
> >  understand that this functionality here isn't made for your use case.
> >  
> >  We can adjust the functionality to better match your requirements, but 
> >  you can't say it is broken because it doesn't work when you use it not 
> >  in the way it is intended to be used.
> > 
> 
> I believe "adjusting" functionality to fit some external requirements,
> may have unintended consequences, requiring yet more and more "adjustments".
> (Or may allow (new) drivers to do wild things which may lead to wild results. 
> :-) )
> 
> We need to be extra careful and wary of this.

Either drm/scheduler is common code that we should use for our driver, in which 
case we need to "adjust" it to fit the requirements of a safe Rust abstraction 
usable for AGX. Or, drm/scheduler is not common code intended for drivers with 
our requirements, and then we need to be able to write our own scheduler.

AMD has NAK'd both options, effectively NAK'ing the driver.

I will ask a simple yes/no question: Should we use drm/sched?

If yes, it will need patches like these, and AMD needs to be ok with that and 
stop NAK'ing them on sight becuase they don't match the existing requirements.

If no, we will write our own scheduler in Rust, and AMD needs to be ok with 
that and not NAK it on sight because it's not drm/sched.

Which is it?

Note if we write a Rust scheduler, drm/sched and amdgpu will be unaffected. If 
we do that and AMD comes back and NAKs it -- as said in this thread would 
"probably" happen -- then it is impossible for us to upstream a driver 
regardless of whether we use drm/sched.

Lina has been polite and accommodating while AMD calls her code "outright 
nonsense" and gets "outright NAK"s, and puts her into an impossible catch-22 
where no matter what she does it's NAK'd.

That's not ok.

Re: [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down

2023-03-08 Thread alyssa

> You can't ask me for a list
> of pending jobs (the scheduler knows this, it doesn't make any sense to
> duplicate that outside)

Silly question: could you add a new exported function to drm_sched to get the 
list of pending jobs, to be used by the Rust abstraction internally? IDK if 
that makes any sense.

Re: [PATCH 4/5] drm/panfrost: Use drm_sched_job_add_syncobj_dependency()

2023-02-08 Thread Alyssa Rosenzweig

R-b, thanks

On Wed, Feb 08, 2023 at 04:48:16PM -0300, Ma??ra Canal wrote:
> As panfrost_copy_in_sync() performs the same steps as
> drm_sched_job_add_syncobj_dependency(), replace the open-coded
> implementation in Panfrost in order to simply, using the DRM function.
> 
> Signed-off-by: Ma??ra Canal 
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index abb0dadd8f63..f49096f53141 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -220,15 +220,8 @@ panfrost_copy_in_sync(struct drm_device *dev,
>   }
>  
>   for (i = 0; i < in_fence_count; i++) {
> - struct dma_fence *fence;
> -
> - ret = drm_syncobj_find_fence(file_priv, handles[i], 0, 0,
> -  &fence);
> - if (ret)
> - goto fail;
> -
> - ret = drm_sched_job_add_dependency(&job->base, fence);
> -
> + ret = drm_sched_job_add_syncobj_dependency(&job->base, 
> file_priv,
> +handles[i], 0);
>   if (ret)
>   goto fail;
>   }
> -- 
> 2.39.1
>

Re: [RFC PATCH] drm/pancsf: Add a new driver for Mali CSF-based GPUs

2023-02-03 Thread Alyssa Rosenzweig

> > > +struct drm_pancsf_gpu_info {
> > > +#define DRM_PANCSF_ARCH_MAJOR(x) ((x) >> 28)
> > > +#define DRM_PANCSF_ARCH_MINOR(x) (((x) >> 24) & 0xf)
> > > +#define DRM_PANCSF_ARCH_REV(x)   (((x) >> 20) & 0xf)
> > > +#define DRM_PANCSF_PRODUCT_MAJOR(x)  (((x) >> 16) & 0xf)
> > > +#define DRM_PANCSF_VERSION_MAJOR(x)  (((x) >> 12) & 0xf)
> > > +#define DRM_PANCSF_VERSION_MINOR(x)  (((x) >> 4) & 0xff)
> > > +#define DRM_PANCSF_VERSION_STATUS(x) ((x) & 0xf)
> > > + __u32 gpu_id;
> > > + __u32 gpu_rev;
> > > +#define DRM_PANCSF_CSHW_MAJOR(x) (((x) >> 26) & 0x3f)
> > > +#define DRM_PANCSF_CSHW_MINOR(x) (((x) >> 20) & 0x3f)
> > > +#define DRM_PANCSF_CSHW_REV(x)   (((x) >> 16) & 0xf)
> > > +#define DRM_PANCSF_MCU_MAJOR(x)  (((x) >> 10) & 0x3f)
> > > +#define DRM_PANCSF_MCU_MINOR(x)  (((x) >> 4) & 0x3f)
> > > +#define DRM_PANCSF_MCU_REV(x)((x) & 0xf)
> > > + __u32 csf_id;
> > > + __u32 l2_features;
> > > + __u32 tiler_features;
> > > + __u32 mem_features;
> > > + __u32 mmu_features;
> > > + __u32 thread_features;
> > > + __u32 max_threads;
> > > + __u32 thread_max_workgroup_size;
> > > + __u32 thread_max_barrier_size;
> > > + __u32 coherency_features;
> > > + __u32 texture_features[4];
> > > + __u32 as_present;
> > > + __u32 core_group_count;
> > > + __u64 shader_present;
> > > + __u64 l2_present;
> > > + __u64 tiler_present;
> > > +};
> > > +
> > > +struct drm_pancsf_csif_info {
> > > + __u32 csg_slot_count;
> > > + __u32 cs_slot_count;
> > > + __u32 cs_reg_count;
> > > + __u32 scoreboard_slot_count;
> > > + __u32 unpreserved_cs_reg_count;
> > > +};
> > > +
> > > +struct drm_pancsf_dev_query {
> > > + /** @type: the query type (see enum drm_pancsf_dev_query_type). */
> > > + __u32 type;
> > > +
> > > + /**
> > > +  * @size: size of the type being queried.
> > > +  *
> > > +  * If pointer is NULL, size is updated by the driver to provide the
> > > +  * output structure size. If pointer is not NULL, the the driver will
> > > +  * only copy min(size, actual_structure_size) bytes to the pointer,
> > > +  * and update the size accordingly. This allows us to extend query
> > > +  * types without breaking userspace.
> > > +  */
> > > + __u32 size;
> > > +
> > > + /**
> > > +  * @pointer: user pointer to a query type struct.
> > > +  *
> > > +  * Pointer can be NULL, in which case, nothing is copied, but the
> > > +  * actual structure size is returned. If not NULL, it must point to
> > > +  * a location that's large enough to hold size bytes.
> > > +  */
> > > + __u64 pointer;
> > > +};  
> > 
> > Genuine question: is there something wrong with the panfrost 'get_param'
> > ioctl where individual features are queried one-by-one, rather than
> > passing a big structure back to user space.
> 
> Well, I've just seen the Xe driver exposing things this way, and I thought
> it was a good idea, but I don't have a strong opinion here, and if others
> think it's preferable to stick to GET_PARAM, I'm fine with that too.

I vastly prefer the info struct, GET_PARAM isn't a great interface when
there are large numbers of properties to query... Actually I just
suggested to Lina that she adopt this approach for Asahi instead of the
current GET_PARAM ioctl we have (downstream for now).

It isn't a *big* deal but GET_PARAM doesn't really seem better on any
axes.

> > I ask because we've had issues in the past with trying to 'deprecate'
> > registers - if a new version of the hardware stops providing a
> > (meaningful) value for a register then it's hard to fix up the
> > structures.

I'm not sure this is a big deal. If the register no longer exists
(meaningfully), zero it out in the info structure and trust userspace to
interpret meaningfully based on the GPU. If registers are getting
dropped between revisions, that's obviously not great. But this should
only change at major architecture boundaries; I don't see the added
value of doing the interpretation in kernel instead of userspace. I say
this with my userspace hat on, of course ;-)

> > There is obviously overhead iterating over all the register that user
> > space cares about. Another option (used by kbase) is to return some form
> > of structured data so a missing property can be encoded.
> 
> I'll have a look at how kbase does that. Thanks for the pointer.

I'd be fine with the kbase approach but I don't really see the added
value over what Boris proposed in the RFC, tbh.

Re: [RFC PATCH] drm/pancsf: Add a new driver for Mali CSF-based GPUs

2023-02-03 Thread Alyssa Rosenzweig

> > Mali v10 (second Valhal iteration) and later GPUs replaced the Job
> > Manager block by a command stream based interface called CSF (for
> > Command Stream Frontend). This interface is not only turning the job
> > chain based submission model into a command stream based one, but also
> > introducing FW-assisted scheduling of command stream queues. This is a
> > fundamental shift in both how userspace is supposed to submit jobs, but
> > also how the driver is architectured. We initially tried to retrofit the
> > CSF model into panfrost, but this ended up introducing unneeded
> > complexity to the existing driver, which we all know is a potential
> > source of regression.
> 
> While I agree there's some big differences which effectively mandate
> splitting the driver I do think there are some parts which make a lot of
> sense to share.
> 
> For example pancsf_regs.h and panfrost_regs.h are really quite similar
> and I think could easily be combined. The clock/regulator code is pretty
> much a direct copy/paste (just adding support for more clocks), etc.
> 
> What would be ideal is factoring out 'generic' parts from panfrost and
> then being able to use them from pancsf.
> 
> I had a go at starting that:
> 
> https://gitlab.arm.com/linux-arm/linux-sp/-/tree/pancsf-refactor
> 
> (lightly tested for Panfrost, only build tested for pancsf).
> 
> That saves around 200 lines overall and avoids needing to maintain two
> lots of clock/regulator code. There's definite scope for sharing (most)
> register definitions between the drivers and quite possibly some of the
> MMU/memory code (although there's diminishing returns there).

200 lines saved in a 5kloc+ driver doesn't seem worth much, especially
against the added testing combinatorics, TBH. The main reason I can see
to unify is if we want VM_BIND (and related goodies) on JM hardware too.
That's only really for Vulkan and I really don't see the case for Vulkan
on anything older than Valhall at this point. So it comes down to
whether we want to start Vulkan at v9 or skip to v10. The separate
panfrost/pancsf drivers approach strongly favours the latter.

Re: Retiring the GitHub mirrors

2023-01-20 Thread Alyssa Rosenzweig

> Among the people present in this discussion, the consensus was that we
> should delete them.

I wasn't present but +1 from me.

Re: [RFC PATCH 00/20] Initial Xe driver submission

2023-01-03 Thread Alyssa Rosenzweig

> > For one thing, setting that up would be a lot of up front infrastructure
> > work. I'm not sure how to even pull that off when Xe is still
> > out-of-tree and i915 development plunges on upstream as ever.
> > 
> > For another, realistically, the overlap between supported platforms is
> > going to end at some point, and eventually new platforms are only going
> > to be supported with Xe. That's going to open up new possibilities for
> > refactoring also the display code. I think it would be premature to lock
> > in to a common directory structure or a common helper module at this
> > point.
> > 
> > I'm not saying no to the idea, and we've contemplated it before, but I
> > think there are still too many moving parts to decide to go that way.
> 
> FWIW, I actually have the same dilemma with the driver for new Mali GPUs
> I'm working on. I initially started making it a sub-driver of the
> existing panfrost driver (some HW blocks are similar, like the
> IOMMU and a few other things, and some SW abstracts can be shared here
> and there, like the GEM allocator logic). But I'm now considering
> forking the driver (after Alyssa planted the seed :-)), not only
> because I want to start from a clean sheet on the the uAPI front
> (wouldn't be an issue in your case, because you're talking about
> sharing helpers, not the driver frontend), but also because any refactor
> to panfrost is a potential source of regression for existing users. So,
> I tend to agree with Jani here, trying to share code before things have
> settled down is likely to cause pain to both Xe and i915
> users+developers.

++

I pretend to have never written a kernel driver, so will not comment
there. But Boris and I were previously bit trying to share code between
our GL and VK drivers, before VK settled down, causing pain for both. I
don't want a kernelside repeat of that (for either Mali or Intel).

I tend to think that, if you're tempted to share a driver frontend
without the backend, that's a sign that there's too much boilerplate for
the frontend and maybe there needs to be more helpers somewhere. For Xe,
that doesn't apply since the hw overlaps between the drivers, but for
Mali, there really is more different than similar and there's an
obvious, acute break between "old Mali" and "new Mali". The shared
"instantiate a DRM driver boilerplate" is pretty trivial, and the MMU
code is as simple as it gets...

Re: [PATCH v2 16/26] drm: panfrost: Remove #ifdef guards for PM related functions

2022-11-29 Thread Alyssa Rosenzweig

Sounds like a nice clean up :-) 

Acked-by: Alyssa Rosenzweig 

On Tue, Nov 29, 2022 at 07:19:32PM +, Paul Cercueil wrote:
> Use the EXPORT_GPL_RUNTIME_DEV_PM_OPS() and pm_ptr() macros to handle
> the PM callbacks.
> 
> These macros allow the PM functions to be automatically dropped by the
> compiler when CONFIG_PM is disabled, without having to use #ifdef
> guards.
> 
> This has the advantage of always compiling these functions in,
> independently of any Kconfig option. Thanks to that, bugs and other
> regressions are subsequently easier to catch.
> 
> Signed-off-by: Paul Cercueil 
> Reviewed-by: Steven Price 
> ---
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Steven Price 
> Cc: Alyssa Rosenzweig 
> ---
>  drivers/gpu/drm/panfrost/panfrost_device.c | 10 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h |  4 ++--
>  drivers/gpu/drm/panfrost/panfrost_drv.c|  7 +--
>  3 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
> b/drivers/gpu/drm/panfrost/panfrost_device.c
> index ee612303f076..fa1a086a862b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "panfrost_device.h"
> @@ -400,8 +401,7 @@ void panfrost_device_reset(struct panfrost_device *pfdev)
>   panfrost_job_enable_interrupts(pfdev);
>  }
>  
> -#ifdef CONFIG_PM
> -int panfrost_device_resume(struct device *dev)
> +static int panfrost_device_resume(struct device *dev)
>  {
>   struct panfrost_device *pfdev = dev_get_drvdata(dev);
>  
> @@ -411,7 +411,7 @@ int panfrost_device_resume(struct device *dev)
>   return 0;
>  }
>  
> -int panfrost_device_suspend(struct device *dev)
> +static int panfrost_device_suspend(struct device *dev)
>  {
>   struct panfrost_device *pfdev = dev_get_drvdata(dev);
>  
> @@ -423,4 +423,6 @@ int panfrost_device_suspend(struct device *dev)
>  
>   return 0;
>  }
> -#endif
> +
> +EXPORT_GPL_RUNTIME_DEV_PM_OPS(panfrost_pm_ops, panfrost_device_suspend,
> +   panfrost_device_resume, NULL);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
> b/drivers/gpu/drm/panfrost/panfrost_device.h
> index 8b25278f34c8..d9ba68cffb77 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -7,6 +7,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -172,8 +173,7 @@ int panfrost_device_init(struct panfrost_device *pfdev);
>  void panfrost_device_fini(struct panfrost_device *pfdev);
>  void panfrost_device_reset(struct panfrost_device *pfdev);
>  
> -int panfrost_device_resume(struct device *dev);
> -int panfrost_device_suspend(struct device *dev);
> +extern const struct dev_pm_ops panfrost_pm_ops;
>  
>  enum drm_panfrost_exception_type {
>   DRM_PANFROST_EXCEPTION_OK = 0x00,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2fa5afe21288..fa619fe72086 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -676,17 +676,12 @@ static const struct of_device_id dt_match[] = {
>  };
>  MODULE_DEVICE_TABLE(of, dt_match);
>  
> -static const struct dev_pm_ops panfrost_pm_ops = {
> - SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, 
> pm_runtime_force_resume)
> - SET_RUNTIME_PM_OPS(panfrost_device_suspend, panfrost_device_resume, 
> NULL)
> -};
> -
>  static struct platform_driver panfrost_driver = {
>   .probe  = panfrost_probe,
>   .remove = panfrost_remove,
>   .driver = {
>   .name   = "panfrost",
> - .pm = &panfrost_pm_ops,
> + .pm = pm_ptr(&panfrost_pm_ops),
>   .of_match_table = dt_match,
>   },
>  };
> -- 
> 2.35.1
>

Re: [PATCH] drm/panfrost: Remove type name from internal struct again

2022-11-07 Thread Alyssa Rosenzweig

Reviewed-by: Alyssa Rosenzweig 

On Thu, Nov 03, 2022 at 11:40:36AM +, Steven Price wrote:
> Commit 72655fb942c1 ("drm/panfrost: replace endian-specific types with
> native ones") accidentally reverted part of the parent commit
> 7228d9d79248 ("drm/panfrost: Remove type name from internal structs")
> leading to the situation that the Panfrost UAPI header still doesn't
> compile correctly in C++.
> 
> Revert the accidental revert and pass me a brown paper bag.
> 
> Reported-by: Alyssa Rosenzweig 
> Fixes: 72655fb942c1 ("drm/panfrost: replace endian-specific types with native 
> ones")
> Signed-off-by: Steven Price 
> ---
>  include/uapi/drm/panfrost_drm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index 6f93c915cc88..9f231d40a146 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -254,7 +254,7 @@ struct panfrost_dump_object_header {
>   __u64 nbos;
>   } reghdr;
>  
> - struct pan_bomap_hdr {
> + struct {
>   __u32 valid;
>   __u64 iova;
>   __u32 data[2];
> -- 
> 2.34.1
>

Re: [PATCH v2 0/2] drm/panfrost: Fix UAPI for C++/BSD compatibility

2022-10-17 Thread Alyssa Rosenzweig

Series is

Reviewed-by: Alyssa Rosenzweig 

Thank you for this, please push to the appropriate trees so we can fix
the Mesa build.

On Mon, Oct 17, 2022 at 11:46:00AM +0100, Steven Price wrote:
> The Panfrost DRM interface to user space is uesd in Mesa for targets
> other than C/Linux. Specifically the header file needs to compile in C++
> code and for FreeBSD which shares the same UABI.
> 
> The first patch fixes the C++ compilation issue by removing the
> (unnecessary) type name from internal structs which is invalid in C++.
> 
> The second patch technically changes the UABI by changing the header
> values in the dump format to be native endian rather than fixed
> little-endian. Since (a) there are no known big-endian Mali systems, and
> (b) this has only appeared in -rc1, this shouldn't break user space.
> Tools can use the 'magic' field to identify the endianness of the dump
> if they want to support big-endian.
> 
> This is effectively a 'v2' of Adri??n's series here [1].
> 
> [1] 
> https://lore.kernel.org/r/20220920211545.1017355-1-adrian.larumbe%40collabora.com
> 
> Steven Price (2):
>   drm/panfrost: Remove type name from internal structs
>   drm/panfrost: replace endian-specific types with native ones
> 
>  drivers/gpu/drm/panfrost/panfrost_dump.c | 36 
>  include/uapi/drm/panfrost_drm.h  | 36 +---
>  2 files changed, 38 insertions(+), 34 deletions(-)
> 
> -- 
> 2.34.1
>

Re: [PATCH 2/2] drm/panfrost: replace endian-specific types with generic ones

2022-09-21 Thread Alyssa Rosenzweig

> > Or of course we could just actually use native endian and detect from
> > the magic which endian is in use. That would require ripping out the
> > cpu_to_lexx() calls in Linux and making the user space tool more
> > intelligent. I'm happy with that, but it's pushing the complexity onto Mesa.
> 
> If there's a clearly identifiable header, then I'd say making the whole dump
> native-endian is probably the way to go. Unless and until anyone actually
> demands to be able to do cross-endian post-mortem GPU debugging, the
> realistic extent of the complexity in Mesa is that it doesn't recognise the
> foreign dump format and gives up, which I assume is already implemented :)

+1 to this solution. Gets the complexity out of both kernel and Mesa,
and in the vanishingly unlikely scenario that we need this
functionality, we can add it to Mesa without kernel changes. As mesa
panfrost maintainer I'll take those odds :+1:

Re: [PATCH 2/2] drm/panfrost: replace endian-specific types with generic ones

2022-09-20 Thread Alyssa Rosenzweig

Tentative r-b, but we *do* need to make a decision on how we want to
handle endianness. I don't have strong feelings but the results of that
discussion should go in the commit message.

On Tue, Sep 20, 2022 at 10:15:45PM +0100, Adri??n Larumbe wrote:
> __le32 and __l64 endian-specific types aren't portable and not available on
> FreeBSD, for which there's a uAPI compatible reimplementation of Panfrost.
> 
> Replace these specific types with more generic unsigned ones, to prevent
> FreeBSD Mesa build errors.
> 
> Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7252
> Fixes: 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
> Signed-off-by: Adri??n Larumbe 
> ---
>  include/uapi/drm/panfrost_drm.h | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index bd77254be121..c1a10a9366a9 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -236,24 +236,24 @@ struct drm_panfrost_madvise {
>  #define PANFROSTDUMP_BUF_TRAILER (PANFROSTDUMP_BUF_BO + 1)
>  
>  struct panfrost_dump_object_header {
> - __le32 magic;
> - __le32 type;
> - __le32 file_size;
> - __le32 file_offset;
> + __u32 magic;
> + __u32 type;
> + __u32 file_size;
> + __u32 file_offset;
>  
>   union {
>   struct {
> - __le64 jc;
> - __le32 gpu_id;
> - __le32 major;
> - __le32 minor;
> - __le64 nbos;
> + __u64 jc;
> + __u32 gpu_id;
> + __u32 major;
> + __u32 minor;
> + __u64 nbos;
>   } reghdr;
>  
>   struct {
> - __le32 valid;
> - __le64 iova;
> - __le32 data[2];
> + __u32 valid;
> + __u64 iova;
> + __u32 data[2];
>   } bomap;
>  
>   /*
> @@ -261,14 +261,14 @@ struct panfrost_dump_object_header {
>* with new fields and also keep it 512-byte aligned
>*/
>  
> - __le32 sizer[496];
> + __u32 sizer[496];
>   };
>  };
>  
>  /* Registers object, an array of these */
>  struct panfrost_dump_registers {
> - __le32 reg;
> - __le32 value;
> + __u32 reg;
> + __u32 value;
>  };
>  
>  #if defined(__cplusplus)
> -- 
> 2.37.0
>

Re: [PATCH] drm/panfrost: Give name to anonymous coredump object union

2022-09-20 Thread Alyssa Rosenzweig

On Tue, Sep 20, 2022 at 02:26:52PM +0100, Steven Price wrote:
> On 19/09/2022 07:44, Adri??n Larumbe wrote:
> > Hi Steven,
> > 
> > On 13.09.2022 09:45, Steven Price wrote:
> >> On 12/09/2022 17:44, Adri??n Larumbe wrote:
> >>> Building Mesa's Perfetto requires including the panfrost drm uAPI header 
> >>> in
> >>> C++ code, but the C++ compiler requires anonymous unions to have only
> >>> public non-static data members.
> >>>
> >>> Commit 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
> >>> introduces one such union, breaking the Mesa build.
> >>>
> >>> Give it a name, and also rename pan_reg_hdr structure because it will
> >>> always be prefixed by the union name.
> >>>
> >>> Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7195
> >>>
> >>> Signed-off-by: Adri??n Larumbe 
> > 
> >> Ouch! It's frustrating how C++ isn't quite a superset of C. However I
> >> think we can solve this with a simpler patch, I'd appreciate testing
> >> that this does indeed fix the build issues with Mesa with all supported
> >> compilers (I'm not so familiar with C++):
> > 
> > I just tested your changes on Mesa and they do fix the build.
> 
> Thanks Adri??n!
> 
> Alyssa: Could you give your R-b if you're happy with this change? It
> would be good to get this fixed before it hits -rc1.

R-b, however the issue isn't totally gone: in a separate but related
issue, apparently the __le types aren't portable and the devcoredump
support has now broken the panfrost (mesa) build for FreeBSD, which has
a UAPI-compatible reimplementation of panfrost.ko ...

Do we maybe want to change all the __le to u at the same time? If we
have to break UAPI, better do it before the UAPI is actually merged.
Panfrost is probably broken in far worse ways on big endian anyway. Or
maybe we want to keep doing little-endian but in u32 containers and have
conversions in the kernel for big-endian CPUs. Or maybe we want to just
"we don't care about big endian, because you'll have worse problems", at
a GPU level Mali hasn't supported big endian since Midgard so I doubt
the recent DDKs would work on BE either.

Anyway, ideally we'd solve both at once, and soon, so we don't have to
revert the devcoredump stuff from mesa.

Thanks,

Alyssa

Re: [PATCH] drm/panfrost: Give name to anonymous coredump object union

2022-09-12 Thread Alyssa Rosenzweig

Have we checked that this actually fixes the Mesa build? If so, R-b.

> Commit 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
> introduces one such union, breaking the Mesa build.
> 
> Give it a name, and also rename pan_reg_hdr structure because it will
> always be prefixed by the union name.
> 
> Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7195
> 
> Signed-off-by: Adri??n Larumbe 

In Mesa, we would add a trailer "Fixes: 730c2bf4ad39 ("drm/panfrost: Add
support for devcoredump")". If the kernel does the same (I don't
remember), we should do that here, seeing as the panfrost uapi headers
do need to build as C++.

Re: [PATCH] drm/panfrost: Update io-pgtable API

2022-08-31 Thread Alyssa Rosenzweig

On Tue, Aug 23, 2022 at 11:42:33AM +0100, Robin Murphy wrote:
> On 2022-08-23 03:51, Alyssa Rosenzweig wrote:
> > > -static size_t get_pgsize(u64 addr, size_t size)
> > > +static size_t get_pgsize(u64 addr, size_t size, size_t *count)
> > >   {
> > > - if (addr & (SZ_2M - 1) || size < SZ_2M)
> > > - return SZ_4K;
> > > + size_t blk_offset = -addr % SZ_2M;
> > 
> > addr is unsigned. if this is correct, it's magic.
> 
> Eh, it's just well-defined unsigned integer overflow. Take "SZ_2M - (addr %
> SZ_2M)", realise the first term can be anything that's zero modulo SZ_2M,
> including zero, then also that the operations can be done in either order to
> give the same result, and there you go.

Shrug. It still seems voodoo to me but if this is normal kernel style
I'm not going to complain, Acked-by for the patch regardless.

Re: [PATCH] drm/panfrost: Update io-pgtable API

2022-08-22 Thread Alyssa Rosenzweig

> -static size_t get_pgsize(u64 addr, size_t size)
> +static size_t get_pgsize(u64 addr, size_t size, size_t *count)
>  {
> - if (addr & (SZ_2M - 1) || size < SZ_2M)
> - return SZ_4K;
> + size_t blk_offset = -addr % SZ_2M;

addr is unsigned. if this is correct, it's magic.

Re: [PATCH v6 1/2] drm/panfrost: Add specific register offset macros for JS and MMU AS

2022-07-29 Thread Alyssa Rosenzweig

Reviewed-by: Alyssa Rosenzweig 

On Fri, Jul 29, 2022 at 03:46:09PM +0100, Adri??n Larumbe wrote:
> Each Panfrost job has its own job slot and MMU address space set of
> registers, which are selected with a job-specific index.
> 
> Turn the shift and stride used for selection of the right register set base
> into a define rather than using magic numbers.
> 
> Signed-off-by: Adri??n Larumbe 
> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 42 ++--
>  1 file changed, 24 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
> b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index accb4fa3adb8..919f44ac853d 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -226,23 +226,25 @@
>  #define JOB_INT_MASK_DONE(j) BIT(j)
>  
>  #define JS_BASE  0x1800
> -#define JS_HEAD_LO(n)(JS_BASE + ((n) * 0x80) + 0x00)
> -#define JS_HEAD_HI(n)(JS_BASE + ((n) * 0x80) + 0x04)
> -#define JS_TAIL_LO(n)(JS_BASE + ((n) * 0x80) + 0x08)
> -#define JS_TAIL_HI(n)(JS_BASE + ((n) * 0x80) + 0x0c)
> -#define JS_AFFINITY_LO(n)(JS_BASE + ((n) * 0x80) + 0x10)
> -#define JS_AFFINITY_HI(n)(JS_BASE + ((n) * 0x80) + 0x14)
> -#define JS_CONFIG(n) (JS_BASE + ((n) * 0x80) + 0x18)
> -#define JS_XAFFINITY(n)  (JS_BASE + ((n) * 0x80) + 0x1c)
> -#define JS_COMMAND(n)(JS_BASE + ((n) * 0x80) + 0x20)
> -#define JS_STATUS(n) (JS_BASE + ((n) * 0x80) + 0x24)
> -#define JS_HEAD_NEXT_LO(n)   (JS_BASE + ((n) * 0x80) + 0x40)
> -#define JS_HEAD_NEXT_HI(n)   (JS_BASE + ((n) * 0x80) + 0x44)
> -#define JS_AFFINITY_NEXT_LO(n)   (JS_BASE + ((n) * 0x80) + 0x50)
> -#define JS_AFFINITY_NEXT_HI(n)   (JS_BASE + ((n) * 0x80) + 0x54)
> -#define JS_CONFIG_NEXT(n)(JS_BASE + ((n) * 0x80) + 0x58)
> -#define JS_COMMAND_NEXT(n)   (JS_BASE + ((n) * 0x80) + 0x60)
> -#define JS_FLUSH_ID_NEXT(n)  (JS_BASE + ((n) * 0x80) + 0x70)
> +#define JS_SLOT_STRIDE   0x80
> +
> +#define JS_HEAD_LO(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x00)
> +#define JS_HEAD_HI(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x04)
> +#define JS_TAIL_LO(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x08)
> +#define JS_TAIL_HI(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x0c)
> +#define JS_AFFINITY_LO(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x10)
> +#define JS_AFFINITY_HI(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x14)
> +#define JS_CONFIG(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x18)
> +#define JS_XAFFINITY(n)  (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x1c)
> +#define JS_COMMAND(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x20)
> +#define JS_STATUS(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x24)
> +#define JS_HEAD_NEXT_LO(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x40)
> +#define JS_HEAD_NEXT_HI(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x44)
> +#define JS_AFFINITY_NEXT_LO(n)   (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x50)
> +#define JS_AFFINITY_NEXT_HI(n)   (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x54)
> +#define JS_CONFIG_NEXT(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x58)
> +#define JS_COMMAND_NEXT(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x60)
> +#define JS_FLUSH_ID_NEXT(n)  (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x70)
>  
>  /* Possible values of JS_CONFIG and JS_CONFIG_NEXT registers */
>  #define JS_CONFIG_START_FLUSH_CLEAN  BIT(8)
> @@ -281,7 +283,9 @@
>  #define AS_COMMAND_FLUSH_MEM 0x05/* Wait for memory accesses to 
> complete, flush all the L1s cache then
>  flush all L2 caches then 
> issue a flush region command to all MMUs */
>  
> -#define MMU_AS(as)   (0x2400 + ((as) << 6))
> +#define MMU_BASE 0x2400
> +#define MMU_AS_SHIFT 0x06
> +#define MMU_AS(as)   (MMU_BASE + ((as) << MMU_AS_SHIFT))
>  
>  #define AS_TRANSTAB_LO(as)   (MMU_AS(as) + 0x00) /* (RW) Translation 
> Table Base Address for address space n, low word */
>  #define AS_TRANSTAB_HI(as)   (MMU_AS(as) + 0x04) /* (RW) Translation 
> Table Base Address for address space n, high word */
> @@ -300,6 +304,8 @@
>  #define AS_FAULTE

Re: [PATCH v4 1/2] drm/panfrost: Add specific register offset macros for JS and MMU AS

2022-06-22 Thread Alyssa Rosenzweig

Reviewed-by: Alyssa Rosenzweig 

On Wed, Jun 22, 2022 at 03:36:15PM +0100, Adri??n Larumbe wrote:
> Each Panfrost job has its own job slot and MMU address space set of
> registers, which are selected with a job-specific index.
> 
> Turn the shift and stride used for selection of the right register set base
> into a define rather than using magic numbers.
> 
> Signed-off-by: Adri??n Larumbe 
> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 39 +---
>  1 file changed, 21 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
> b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index accb4fa3adb8..1ddc6c4c5e1c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -225,24 +225,26 @@
>  #define JOB_INT_MASK_ERR(j)  BIT((j) + 16)
>  #define JOB_INT_MASK_DONE(j) BIT(j)
>  
> +#define JS_SLOT_STRIDE   0x80
> +
>  #define JS_BASE  0x1800
> -#define JS_HEAD_LO(n)(JS_BASE + ((n) * 0x80) + 0x00)
> -#define JS_HEAD_HI(n)(JS_BASE + ((n) * 0x80) + 0x04)
> -#define JS_TAIL_LO(n)(JS_BASE + ((n) * 0x80) + 0x08)
> -#define JS_TAIL_HI(n)(JS_BASE + ((n) * 0x80) + 0x0c)
> -#define JS_AFFINITY_LO(n)(JS_BASE + ((n) * 0x80) + 0x10)
> -#define JS_AFFINITY_HI(n)(JS_BASE + ((n) * 0x80) + 0x14)
> -#define JS_CONFIG(n) (JS_BASE + ((n) * 0x80) + 0x18)
> -#define JS_XAFFINITY(n)  (JS_BASE + ((n) * 0x80) + 0x1c)
> -#define JS_COMMAND(n)(JS_BASE + ((n) * 0x80) + 0x20)
> -#define JS_STATUS(n) (JS_BASE + ((n) * 0x80) + 0x24)
> -#define JS_HEAD_NEXT_LO(n)   (JS_BASE + ((n) * 0x80) + 0x40)
> -#define JS_HEAD_NEXT_HI(n)   (JS_BASE + ((n) * 0x80) + 0x44)
> -#define JS_AFFINITY_NEXT_LO(n)   (JS_BASE + ((n) * 0x80) + 0x50)
> -#define JS_AFFINITY_NEXT_HI(n)   (JS_BASE + ((n) * 0x80) + 0x54)
> -#define JS_CONFIG_NEXT(n)(JS_BASE + ((n) * 0x80) + 0x58)
> -#define JS_COMMAND_NEXT(n)   (JS_BASE + ((n) * 0x80) + 0x60)
> -#define JS_FLUSH_ID_NEXT(n)  (JS_BASE + ((n) * 0x80) + 0x70)
> +#define JS_HEAD_LO(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x00)
> +#define JS_HEAD_HI(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x04)
> +#define JS_TAIL_LO(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x08)
> +#define JS_TAIL_HI(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x0c)
> +#define JS_AFFINITY_LO(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x10)
> +#define JS_AFFINITY_HI(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x14)
> +#define JS_CONFIG(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x18)
> +#define JS_XAFFINITY(n)  (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x1c)
> +#define JS_COMMAND(n)(JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x20)
> +#define JS_STATUS(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x24)
> +#define JS_HEAD_NEXT_LO(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x40)
> +#define JS_HEAD_NEXT_HI(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x44)
> +#define JS_AFFINITY_NEXT_LO(n)   (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x50)
> +#define JS_AFFINITY_NEXT_HI(n)   (JS_BASE + ((n) * 
> JS_SLOT_STRIDE) + 0x54)
> +#define JS_CONFIG_NEXT(n)(JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x58)
> +#define JS_COMMAND_NEXT(n)   (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x60)
> +#define JS_FLUSH_ID_NEXT(n)  (JS_BASE + ((n) * JS_SLOT_STRIDE) + 
> 0x70)
>  
>  /* Possible values of JS_CONFIG and JS_CONFIG_NEXT registers */
>  #define JS_CONFIG_START_FLUSH_CLEAN  BIT(8)
> @@ -281,7 +283,8 @@
>  #define AS_COMMAND_FLUSH_MEM 0x05/* Wait for memory accesses to 
> complete, flush all the L1s cache then
>  flush all L2 caches then 
> issue a flush region command to all MMUs */
>  
> -#define MMU_AS(as)   (0x2400 + ((as) << 6))
> +#define MMU_AS_SHIFT 0x06
> +#define MMU_AS(as)   (0x2400 + ((as) << MMU_AS_SHIFT))
>  
>  #define AS_TRANSTAB_LO(as)   (MMU_AS(as) + 0x00) /* (RW) Translation 
> Table Base Address for address space n, low word */
>  #define AS_TRANSTAB_HI(as)   (MMU_AS(as) + 0x04) /* (RW) Translation 
> Table Base Address for address space n, high word */
> -- 
> 2.36.1
>

Re: [PATCH v3 1/1] drm/panfrost: Add support for devcoredump

2022-06-22 Thread Alyssa Rosenzweig

> Sorry about this blunder.
> 
> >> +  slot = panfrost_job_get_slot(job);
> >> +  slot = slot ? slot : 0;
> >
> >`slot = slot ? slot : 0` is a no-op. Delete the line.
> 
> I think what I meant here was 'slot = (slot >= 0) ? slot : 0;' but for some
> reason I blundered again. The point of this was ensuring the slot value 
> wouldn't
> end up wrapping about the maximum unsigned integer value when using it as an
> array offset, in the off-chance that panfrost_job_get_slot() ever returned a
> negative value.
> 
> In v4 I've instead rewritten this as a sanity check:
> 
> WARN_ON(slot < 0);

No, this doesn't make sense. There at most 3 job slots -- 0, 1, and 2.

> Although perhaps in the future panfrost_job_get_slot should return an unsigned
> integer instead?

Sure. Kernel style doesn't seem big on unsigned, if this were
mesa it would return unsigned. Returning u8 or u32 seems reasonable at
any rate.

> >As a general note, I'd appreciate breaking out the panfrost_regs.h
> >changes into a separate patch, as they are a logically separate clean
> >up to make room for this patch. Thanks.
> 
> Done in v4.

Thanks!

Alyssa

Re: [PATCH v3 1/1] drm/panfrost: Add support for devcoredump

2022-06-22 Thread Alyssa Rosenzweig

> + js_as_offset = slot * 0x80;

JS_SLOT_STRIDE

> + slot = panfrost_job_get_slot(job);
> + slot = slot ? slot : 0;

`slot = slot ? slot : 0` is a no-op. Delete the line.

> + if (!IS_ERR(page))
> + *bomap++ = cpu_to_le64(page_to_phys(page));
> + else {
> + dev_err(pfdev->dev, "Panfrost Dump: wrong 
> page\n");
> + *bomap++ = ~cpu_to_le64(0);
> + }
> + }

Nit: because you have { braces } around half the if, please add
{ braces } around the other half for consistency.

---

As a general note, I'd appreciate breaking out the panfrost_regs.h
changes into a separate patch, as they are a logically separate clean
up to make room for this patch. Thanks.

Re: [PATCH v2 1/1] drm/panfrost: Add support for devcoredump

2022-06-22 Thread Alyssa Rosenzweig

> > > + iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN |
> > > + __GFP_NORETRY);
> > > + if (!iter.start) {
> > > + dev_warn(pfdev->dev, "failed to allocate devcoredump file\n");
> > > + return;
> > > + }
> > > ...
> > > + memset(iter.hdr, 0, iter.data - iter.start);
> > 
> > Why are we using __GFP_NOWARN and __GFP_NORETRY? Why not plain vmalloc?
> > 
> > Also, why vmalloc instead of vzalloc? (Or adding __GFP_ZERO to the list
> > of __vmalloc flags if __GFP_NOWARN/__GFP_NORETRY are really needed?) Are
> > there relevant performance or security considerations?
> 
> I borrowed this code from Etnaviv a while ago and the same doubt struck me
> then. My understanding of its intended behaviour is that because the dump file
> might be huge, we don't want the memory manager to trigger the OOM killer and
> annoy quite a few running processes because of a debug feature. Also since the
> code already handles the situation when an allocation fails by refusing to
> generate a dump, there's no need for the allocator to generate specific error
> messages.
> 
> So I guess it boils down to 'if there's quite enough memory to allocate a huge
> dump file, go ahead, otherwise don't reclaim any processes' pages for 
> something
> that isn't essential'.
> 
> I don't see much use for __GFP_ZERO in this case, because the dump file gets
> memcpy'd with the contents of every single bo so whatever the original
> contents of the memory were at the time of the allocation, they're overwritten
> immediately.

I think that's a reasonable explanation, bearing in mind I'm firmly a
userspace person ;-)

Please add a comment explaining the assumptions here, though, because
the code will live longer than this ML thread.

> I've also rebased v3 on top of drm-misc-next and the compiler error because of
> the removed panfrost_job structure member is gone.

Excellent

Re: [kbuild-all] Re: [PATCH v2 1/1] drm/panfrost: Add support for devcoredump

2022-06-22 Thread Alyssa Rosenzweig

Hi Rong Chen,

Sorry for the noise -- I think that was meant for Adrian!

Apologies,

Alyssa

On Wed, Jun 22, 2022 at 10:30:00AM +0800, Chen, Rong A wrote:
> 
> 
> On 6/21/2022 10:32 PM, Alyssa Rosenzweig wrote:
> > > drivers/gpu/drm/panfrost/panfrost_dump.c: In function 
> > > 'panfrost_core_dump':
> > > > > drivers/gpu/drm/panfrost/panfrost_dump.c:115:20: error: 'struct 
> > > > > panfrost_job' has no member named 'file_priv'
> > >   115 | as_nr = job->file_priv->mmu->as;
> > >   |^~
> > 
> > FWIW -- this is due to recent changes in panfrost, you should rebase on
> > the latest drm-misc-next which is where the patch will be applied
> > anyway.
> 
> Hi Alyssa,
> 
> Thanks for your help, we'll try drm-misc-next next time.
> 
> Best Regards,
> Rong Chen

Re: [PATCH v2 1/1] drm/panfrost: Add support for devcoredump

2022-06-21 Thread Alyssa Rosenzweig

>drivers/gpu/drm/panfrost/panfrost_dump.c: In function 'panfrost_core_dump':
> >> drivers/gpu/drm/panfrost/panfrost_dump.c:115:20: error: 'struct 
> >> panfrost_job' has no member named 'file_priv'
>  115 | as_nr = job->file_priv->mmu->as;
>  |^~

FWIW -- this is due to recent changes in panfrost, you should rebase on
the latest drm-misc-next which is where the patch will be applied
anyway.

Re: [PATCH v2 1/1] drm/panfrost: Add support for devcoredump

2022-06-21 Thread Alyssa Rosenzweig

Hi Adrian,

Great work on the devcoredump support! This is really cool to see coming
along, thank you! I've left a few notes below:

> + if (panfrost_dump_registers[i] >= JS_HEAD_LO(0) &&
> + panfrost_dump_registers[i] <= JS_CONFIG_NEXT(0))
> + js_as_offset = slot * 0x80;
> + else if (panfrost_dump_registers[i] >= AS_TRANSTAB_LO(0) &&
> +  panfrost_dump_registers[i] <= AS_STATUS(0))
> + js_as_offset = ((as_nr) << 6);

I'm not a fan of the magic numbers. Do you think it makes sense to add

#define JS_SLOT_STRIDE 0x80
#define MMU_AS_SHIFT 0x6

in the appropriate places in panfrost_regs.h, reexpress the existing
#defines in terms of those

#define JS_HEAD_LO(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x00)
...
#define JS_FLUSH_ID_NEXT(n) (JS_BASE + ((n) * JS_SLOT_STRIDE) + 0x70)
...
#define MM_AS(as) (0x2400 + ((as) << MMU_AS_SHIFT)

and then use those here?

Also, drop the parans around (as_nr), this isn't a macro.

> + /* Add in the active buffer objects */
> + for (i = 0; i < job->bo_count; i++) {
> + dbo = job->bos[i];
> + file_size += dbo->size;
> + n_bomap_pages += dbo->size >> PAGE_SHIFT;
> + n_obj++;
> + }

Strictly, I don't think this is right -- what happens if the CPU is
configured to use 16K or 64K pages? -- however, that mistake is pretty
well entrenched in panfrost.ko right now and it doesn't seem to bother
anyone (non-4K pages on arm64 are pretty rare outside of fruit
computers).

That said, out-of-context there looks like an alignment question. Could
we add an assert for that, documenting the invariant:

WARN_ON(!IS_ALIGNED(dbo->size, PAGE_SIZE));

> + iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN |
> + __GFP_NORETRY);
> + if (!iter.start) {
> + dev_warn(pfdev->dev, "failed to allocate devcoredump file\n");
> + return;
> + }
> ...
> + memset(iter.hdr, 0, iter.data - iter.start);

Why are we using __GFP_NOWARN and __GFP_NORETRY? Why not plain vmalloc?

Also, why vmalloc instead of vzalloc? (Or adding __GFP_ZERO to the list
of __vmalloc flags if __GFP_NOWARN/__GFP_NORETRY are really needed?) Are
there relevant performance or security considerations?

> +/* Definitions for coredump decoding in user space */
> +#define PANFROSTDUMP_VERSION_1 1

I'm not a fan of an enum that just represents a number. Using the
numbers directly means we can compare them in a natural way. Also, using
a major/minor split like Steven suggested can help with semantic
versioning.

Cheers,
Alyssa

Re: [PATCH v6 04/22] drm/panfrost: Fix shrinker list corruption by madvise IOCTL

2022-05-27 Thread Alyssa Rosenzweig

Acked-by: Alyssa Rosenzweig 

On Fri, May 27, 2022 at 02:50:22AM +0300, Dmitry Osipenko wrote:
> Calling madvise IOCTL twice on BO causes memory shrinker list corruption
> and crashes kernel because BO is already on the list and it's added to
> the list again, while BO should be removed from from the list before it's
> re-added. Fix it.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 087e69b98d06..b1e6d238674f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -433,8 +433,8 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
> void *data,
>  
>   if (args->retained) {
>   if (args->madv == PANFROST_MADV_DONTNEED)
> - list_add_tail(&bo->base.madv_list,
> -   &pfdev->shrinker_list);
> + list_move_tail(&bo->base.madv_list,
> +&pfdev->shrinker_list);
>   else if (args->madv == PANFROST_MADV_WILLNEED)
>   list_del_init(&bo->base.madv_list);
>   }
> -- 
> 2.35.3
>

Re: [PATCH v6 22/22] drm/panfrost: Switch to generic memory shrinker

2022-05-27 Thread Alyssa Rosenzweig

Acked-by: Alyssa Rosenzweig 

On Fri, May 27, 2022 at 02:50:40AM +0300, Dmitry Osipenko wrote:
> Replace Panfrost's memory shrinker with a generic drm-shmem memory
> shrinker.
> 
> Tested-by: Steven Price 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/panfrost/Makefile |   1 -
>  drivers/gpu/drm/panfrost/panfrost_device.h|   4 -
>  drivers/gpu/drm/panfrost/panfrost_drv.c   |  19 +--
>  drivers/gpu/drm/panfrost/panfrost_gem.c   |  33 +++--
>  drivers/gpu/drm/panfrost/panfrost_gem.h   |   9 --
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  | 129 --
>  drivers/gpu/drm/panfrost/panfrost_job.c   |  18 ++-
>  7 files changed, 42 insertions(+), 171 deletions(-)
>  delete mode 100644 drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
> 
> diff --git a/drivers/gpu/drm/panfrost/Makefile 
> b/drivers/gpu/drm/panfrost/Makefile
> index b71935862417..ecf0864cb515 100644
> --- a/drivers/gpu/drm/panfrost/Makefile
> +++ b/drivers/gpu/drm/panfrost/Makefile
> @@ -5,7 +5,6 @@ panfrost-y := \
>   panfrost_device.o \
>   panfrost_devfreq.o \
>   panfrost_gem.o \
> - panfrost_gem_shrinker.o \
>   panfrost_gpu.o \
>   panfrost_job.o \
>   panfrost_mmu.o \
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
> b/drivers/gpu/drm/panfrost/panfrost_device.h
> index 8b25278f34c8..fe04b21fc044 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -115,10 +115,6 @@ struct panfrost_device {
>   atomic_t pending;
>   } reset;
>  
> - struct mutex shrinker_lock;
> - struct list_head shrinker_list;
> - struct shrinker shrinker;
> -
>   struct panfrost_devfreq pfdevfreq;
>  };
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 859e240161d1..b77c99ba2475 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -160,7 +160,6 @@ panfrost_lookup_bos(struct drm_device *dev,
>   break;
>   }
>  
> - atomic_inc(&bo->gpu_usecount);
>   job->mappings[i] = mapping;
>   }
>  
> @@ -392,7 +391,6 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
> void *data,
>  {
>   struct panfrost_file_priv *priv = file_priv->driver_priv;
>   struct drm_panfrost_madvise *args = data;
> - struct panfrost_device *pfdev = dev->dev_private;
>   struct drm_gem_object *gem_obj;
>   struct panfrost_gem_object *bo;
>   int ret = 0;
> @@ -409,7 +407,6 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
> void *data,
>   if (ret)
>   goto out_put_object;
>  
> - mutex_lock(&pfdev->shrinker_lock);
>   mutex_lock(&bo->mappings.lock);
>   if (args->madv == PANFROST_MADV_DONTNEED) {
>   struct panfrost_gem_mapping *first;
> @@ -435,17 +432,8 @@ static int panfrost_ioctl_madvise(struct drm_device 
> *dev, void *data,
>  
>   args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
>  
> - if (args->retained) {
> - if (args->madv == PANFROST_MADV_DONTNEED)
> - list_move_tail(&bo->base.madv_list,
> -&pfdev->shrinker_list);
> - else if (args->madv == PANFROST_MADV_WILLNEED)
> - list_del_init(&bo->base.madv_list);
> - }
> -
>  out_unlock_mappings:
>   mutex_unlock(&bo->mappings.lock);
> - mutex_unlock(&pfdev->shrinker_lock);
>   dma_resv_unlock(bo->base.base.resv);
>  out_put_object:
>   drm_gem_object_put(gem_obj);
> @@ -577,9 +565,6 @@ static int panfrost_probe(struct platform_device *pdev)
>   ddev->dev_private = pfdev;
>   pfdev->ddev = ddev;
>  
> - mutex_init(&pfdev->shrinker_lock);
> - INIT_LIST_HEAD(&pfdev->shrinker_list);
> -
>   err = panfrost_device_init(pfdev);
>   if (err) {
>   if (err != -EPROBE_DEFER)
> @@ -601,7 +586,7 @@ static int panfrost_probe(struct platform_device *pdev)
>   if (err < 0)
>   goto err_out1;
>  
> - panfrost_gem_shrinker_init(ddev);
> + drm_gem_shmem_shrinker_register(ddev);
>  
>   return 0;
>  
> @@ -619,8 +604,8 @@ static int panfrost_remove(struct platform_device *pdev)
>   struct panfrost_device *pfdev = platform_get_drvdata(pdev);
>   struct drm_device *ddev = pfdev->ddev;
>  
> + drm_gem_shmem_shrinker_unregister(ddev);

[PATCH v2 5/9] drm/panfrost: Add HW_ISSUE_TTRX_3485 quirk

2022-05-25 Thread Alyssa Rosenzweig

TTRX_3485 requires the infamous "dummy job" workaround. I have this
workaround implemented in a local branch, but I have not yet hit a case
that requires it so I cannot test whether the implementation is correct.
In the mean time, add the quirk bit so we can document which platforms
may need it in the future.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_issues.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index e35807e4b743..4d41e0a13867 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -132,6 +132,9 @@ enum panfrost_hw_issue {
 * to hang */
HW_ISSUE_TTRX_3076,
 
+   /* Must issue a dummy job before starting real work to prevent hangs */
+   HW_ISSUE_TTRX_3485,
+
HW_ISSUE_END
 };
 
-- 
2.35.1

[PATCH v2 9/9] drm/panfrost: Add arm,mali-valhall-jm compatible

2022-05-25 Thread Alyssa Rosenzweig

The most important Valhall-specific quirks have been handled, so add the
Valhall compatible and probe.

v2: Use arm,mali-valhall-jm compatible.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 7fcbc2a5b6cd..b48b6f2af029 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -664,6 +664,7 @@ static const struct of_device_id dt_match[] = {
{ .compatible = "arm,mali-t860", .data = &default_data, },
{ .compatible = "arm,mali-t880", .data = &default_data, },
{ .compatible = "arm,mali-bifrost", .data = &default_data, },
+   { .compatible = "arm,mali-valhall-jm", .data = &default_data, },
{ .compatible = "mediatek,mt8183-mali", .data = &mediatek_mt8183_data },
{}
 };
-- 
2.35.1

[PATCH v2 8/9] drm/panfrost: Add Mali-G57 "Natt" support

2022-05-25 Thread Alyssa Rosenzweig

Add the features, issues, and GPU ID for Mali-G57, a first-generation
Valhall GPU. Other first- and second-generation Valhall GPUs should be
similar.

v2: Split out issue list for r0p0 from newer Natt GPUs, as TTRX_3485 was
fixed in r0p1. Unfortunately, MT8192 has a r0p0, so we do need to handle
TTRX_3485.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 12 
 drivers/gpu/drm/panfrost/panfrost_gpu.c  |  3 +++
 drivers/gpu/drm/panfrost/panfrost_issues.h   |  9 +
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 1a8bdebc86a3..7ed0cd3ea2d4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -106,6 +106,18 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_TLS_HASHING) | \
BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
 
+#define hw_features_g57 (\
+   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+   BIT_ULL(HW_FEATURE_XAFFINITY) | \
+   BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+   BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+   BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+   BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
+   BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
+   BIT_ULL(HW_FEATURE_CLEAN_ONLY_SAFE))
+
 static inline bool panfrost_has_hw_feature(struct panfrost_device *pfdev,
   enum panfrost_hw_feature feat)
 {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index e1a6e763d0dc..6452e4e900dd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -201,6 +201,9 @@ static const struct panfrost_model gpu_models[] = {
GPU_MODEL(g52, 0x7002),
GPU_MODEL(g31, 0x7003,
GPU_REV(g31, 1, 0)),
+
+   GPU_MODEL(g57, 0x9001,
+   GPU_REV(g57, 0, 0)),
 };
 
 static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 4d41e0a13867..c5fa9e897a35 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -258,6 +258,15 @@ enum panfrost_hw_issue {
 
 #define hw_issues_g76 0
 
+#define hw_issues_g57 (\
+   BIT_ULL(HW_ISSUE_TTRX_2968_TTRX_3162) | \
+   BIT_ULL(HW_ISSUE_TTRX_3076))
+
+#define hw_issues_g57_r0p0 (\
+   BIT_ULL(HW_ISSUE_TTRX_2968_TTRX_3162) | \
+   BIT_ULL(HW_ISSUE_TTRX_3076) | \
+   BIT_ULL(HW_ISSUE_TTRX_3485))
+
 static inline bool panfrost_has_hw_issue(const struct panfrost_device *pfdev,
 enum panfrost_hw_issue issue)
 {
-- 
2.35.1

[PATCH v2 6/9] drm/panfrost: Add "clean only safe" feature bit

2022-05-25 Thread Alyssa Rosenzweig

Add the HW_FEATURE_CLEAN_ONLY_SAFE bit based on kbase. When I actually
tried to port the logic from kbase, trivial jobs raised Data Invalid
Faults, so this may depend on other coherency details. It's still useful
to have the bit to record the feature bit when adding new models.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 36fadcf9634e..1a8bdebc86a3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -21,6 +21,7 @@ enum panfrost_hw_feature {
HW_FEATURE_TLS_HASHING,
HW_FEATURE_THREAD_GROUP_SPLIT,
HW_FEATURE_IDVS_GROUP_SIZE,
+   HW_FEATURE_CLEAN_ONLY_SAFE,
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
 };
 
-- 
2.35.1

[PATCH v2 7/9] drm/panfrost: Don't set L2_MMU_CONFIG quirks

2022-05-25 Thread Alyssa Rosenzweig

L2_MMU_CONFIG is an implementation-defined register. Different Mali GPUs
define slightly different MAX_READS and MAX_WRITES fields, which
throttle outstanding reads and writes when set to non-zero values. When
left as zero, reads and writes are not throttled.

Both kbase and panfrost always zero these registers. Per discussion with
Steven Price, there are two reasons these quirks may be used:

1. Simulating slower memory subsystems. This use case is only of
   interest to system-on-chip designers; it is not relevant to mainline.

2. Working around broken memory subsystems. Hopefully we never see this
   case in mainline. If we do, we'll need to set this register based on
   an SoC-compatible, rather than generally matching on the GPU model.

To the best of our knowledge, these fields are zero at reset, so the
write is not necessary. Let's remove the write to aid porting to new
Mali GPUs, which have different layouts for the L2_MMU_CONFIG register.

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Steven Price 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 295bef27fb55..e1a6e763d0dc 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -127,18 +127,6 @@ static void panfrost_gpu_init_quirks(struct 
panfrost_device *pfdev)
gpu_write(pfdev, GPU_TILER_CONFIG, quirks);
 
 
-   quirks = gpu_read(pfdev, GPU_L2_MMU_CONFIG);
-
-   /* Limit read & write ID width for AXI */
-   if (panfrost_has_hw_feature(pfdev, 
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
-   quirks &= ~(L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_READS |
-   L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_WRITES);
-   else
-   quirks &= ~(L2_MMU_CONFIG_LIMIT_EXTERNAL_READS |
-   L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES);
-
-   gpu_write(pfdev, GPU_L2_MMU_CONFIG, quirks);
-
quirks = 0;
if ((panfrost_model_eq(pfdev, 0x860) || panfrost_model_eq(pfdev, 
0x880)) &&
pfdev->features.revision >= 0x2000)
-- 
2.35.1

[PATCH v2 2/9] drm/panfrost: Handle HW_ISSUE_TTRX_2968_TTRX_3162

2022-05-25 Thread Alyssa Rosenzweig

Add handling for the HW_ISSUE_TTRX_2968_TTRX_3162 quirk. Logic ported
from kbase. kbase lists this workaround as used on Mali-G57.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 3 +++
 drivers/gpu/drm/panfrost/panfrost_issues.h | 3 +++
 drivers/gpu/drm/panfrost/panfrost_regs.h   | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index aa89926742fd..295bef27fb55 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -108,6 +108,9 @@ static void panfrost_gpu_init_quirks(struct panfrost_device 
*pfdev)
quirks |= SC_LS_ALLOW_ATTR_TYPES;
}
 
+   if (panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_2968_TTRX_3162))
+   quirks |= SC_VAR_ALGORITHM;
+
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_TLS_HASHING))
quirks |= SC_TLS_HASH_ENABLE;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 501a76c5e95f..41a714ce6fce 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -125,6 +125,9 @@ enum panfrost_hw_issue {
 * kernel must fiddle with L2 caches to prevent data leakage */
HW_ISSUE_TGOX_R1_1234,
 
+   /* Must set SC_VAR_ALGORITHM */
+   HW_ISSUE_TTRX_2968_TTRX_3162,
+
HW_ISSUE_END
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 0b6cd8fdcb47..accb4fa3adb8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -195,6 +195,7 @@
 #define SC_TLS_HASH_ENABLE BIT(17)
 #define SC_LS_ATTR_CHECK_DISABLE   BIT(18)
 #define SC_ENABLE_TEXGRD_FLAGS BIT(25)
+#define SC_VAR_ALGORITHM   BIT(29)
 /* End SHADER_CONFIG register */
 
 /* TILER_CONFIG register */
-- 
2.35.1

[PATCH v2 4/9] drm/panfrost: Handle HW_ISSUE_TTRX_3076

2022-05-25 Thread Alyssa Rosenzweig

Some Valhall GPUs require resets when encountering bus faults due to
occlusion query writes. Add the issue bit for this and handle it.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 9 +++--
 drivers/gpu/drm/panfrost/panfrost_issues.h | 4 
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index 7f51a4682ccb..ee612303f076 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -11,6 +11,7 @@
 #include "panfrost_device.h"
 #include "panfrost_devfreq.h"
 #include "panfrost_features.h"
+#include "panfrost_issues.h"
 #include "panfrost_gpu.h"
 #include "panfrost_job.h"
 #include "panfrost_mmu.h"
@@ -380,9 +381,13 @@ const char *panfrost_exception_name(u32 exception_code)
 bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
u32 exception_code)
 {
-   /* Right now, none of the GPU we support need a reset, but this
-* might change.
+   /* If an occlusion query write causes a bus fault on affected GPUs,
+* future fragment jobs may hang. Reset to workaround.
 */
+   if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT)
+   return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076);
+
+   /* No other GPUs we support need a reset */
return false;
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 14670ee58ace..e35807e4b743 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -128,6 +128,10 @@ enum panfrost_hw_issue {
/* Must set SC_VAR_ALGORITHM */
HW_ISSUE_TTRX_2968_TTRX_3162,
 
+   /* Bus fault from occlusion query write may cause future fragment jobs
+* to hang */
+   HW_ISSUE_TTRX_3076,
+
HW_ISSUE_END
 };
 
-- 
2.35.1

[PATCH v2 3/9] drm/panfrost: Constify argument to has_hw_issue

2022-05-25 Thread Alyssa Rosenzweig

Logically, this function is free of side effects, so any pointers it
takes should be const. Needed to avoid a warning in the next patch.

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_issues.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 41a714ce6fce..14670ee58ace 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -251,7 +251,7 @@ enum panfrost_hw_issue {
 
 #define hw_issues_g76 0
 
-static inline bool panfrost_has_hw_issue(struct panfrost_device *pfdev,
+static inline bool panfrost_has_hw_issue(const struct panfrost_device *pfdev,
 enum panfrost_hw_issue issue)
 {
return test_bit(issue, pfdev->features.hw_issues);
-- 
2.35.1

[PATCH v2 1/9] dt-bindings: Add compatible for Mali Valhall (JM)

2022-05-25 Thread Alyssa Rosenzweig

>From the kernel's perspective, (pre-CSF, "Job Manager") Valhall is more
or less compatible with Bifrost, although they differ to userspace. Add
a compatible for Valhall to the existing Bifrost bindings documentation.

As the first SoC with a Valhall GPU receiving mainline support, add a
specific compatible for the MediaTek MT8192, which instantiates a
Mali-G57.

v2: Change compatible to arm,mali-valhall-jm (Daniel Stone).

Signed-off-by: Alyssa Rosenzweig 
CC: devicet...@vger.kernel.org
---
 .../bindings/gpu/arm,mali-bifrost.yaml| 25 +++
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
index 85f8d4764740..78964c140b46 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
@@ -14,16 +14,21 @@ properties:
 pattern: '^gpu@[a-f0-9]+$'
 
   compatible:
-items:
-  - enum:
-  - amlogic,meson-g12a-mali
-  - mediatek,mt8183-mali
-  - realtek,rtd1619-mali
-  - renesas,r9a07g044-mali
-  - renesas,r9a07g054-mali
-  - rockchip,px30-mali
-  - rockchip,rk3568-mali
-  - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
discoverable
+oneOf:
+  - items:
+  - enum:
+  - amlogic,meson-g12a-mali
+  - mediatek,mt8183-mali
+  - realtek,rtd1619-mali
+  - renesas,r9a07g044-mali
+  - renesas,r9a07g054-mali
+  - rockchip,px30-mali
+  - rockchip,rk3568-mali
+  - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
discoverable
+  - items:
+  - enum:
+  - mediatek,mt8192-mali
+  - const: arm,mali-valhall-jm # Mali Valhall GPU model/revision is 
fully discoverable
 
   reg:
 maxItems: 1
-- 
2.35.1

[PATCH v2 0/9] drm/panfrost: Valhall (JM) support

2022-05-25 Thread Alyssa Rosenzweig

Here is version 2 of the series adding support for job manager Valhall
(v9). CSF Valhall is not supported in this series. The core
issues/features are added for Mali-G57 "Natt" as the current target.
Natt is used in MT8192, which needs a few extra patches to follow
(currently blocked on MediaTek integration issues.)

In terms of userspace, Mesa has almost all the required code for GLES3.1
conformance and is just missing a few patches to merge for remaining
features.

v2 addresses minor issues found in v1, but no major changes.

Alyssa Rosenzweig (9):
  dt-bindings: Add compatibles for Mali Valhall GPU
  drm/panfrost: Handle HW_ISSUE_TTRX_2968_TTRX_3162
  drm/panfrost: Constify argument to has_hw_issue
  drm/panfrost: Handle HW_ISSUE_TTRX_3076
  drm/panfrost: Add HW_ISSUE_TTRX_3485 quirk
  drm/panfrost: Add "clean only safe" feature bit
  drm/panfrost: Don't set L2_MMU_CONFIG quirks
  drm/panfrost: Add Mali-G57 "Natt" support
  drm/panfrost: Add arm,mali-valhall-jm compatible

 .../bindings/gpu/arm,mali-bifrost.yaml| 53 +++
 drivers/gpu/drm/panfrost/panfrost_device.c|  9 +++-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |  1 +
 drivers/gpu/drm/panfrost/panfrost_features.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c   | 18 +++
 drivers/gpu/drm/panfrost/panfrost_issues.h| 21 +++-
 drivers/gpu/drm/panfrost/panfrost_regs.h  |  1 +
 7 files changed, 91 insertions(+), 25 deletions(-)

-- 
2.35.1

Re: [PATCH] drm/panfrost: Job should reference MMU not file_priv

2022-05-24 Thread Alyssa Rosenzweig

Acked-by: Alyssa Rosenzweig

Re: [PATCH v2 8/8] drm/panfrost: Switch to generic memory shrinker

2022-03-14 Thread Alyssa Rosenzweig

On Tue, Mar 15, 2022 at 01:42:53AM +0300, Dmitry Osipenko wrote:
> Replace Panfrost's memory shrinker with a generic DRM memory shrinker.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/panfrost/Makefile  |  1 -
>  drivers/gpu/drm/panfrost/panfrost_device.h |  4 
>  drivers/gpu/drm/panfrost/panfrost_drv.c| 19 ++-
>  drivers/gpu/drm/panfrost/panfrost_gem.c| 27 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.h|  9 
>  drivers/gpu/drm/panfrost/panfrost_job.c| 22 +-
>  6 files changed, 40 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/Makefile 
> b/drivers/gpu/drm/panfrost/Makefile
> index b71935862417..ecf0864cb515 100644
> --- a/drivers/gpu/drm/panfrost/Makefile
> +++ b/drivers/gpu/drm/panfrost/Makefile
> @@ -5,7 +5,6 @@ panfrost-y := \
>   panfrost_device.o \
>   panfrost_devfreq.o \
>   panfrost_gem.o \
> - panfrost_gem_shrinker.o \
>   panfrost_gpu.o \
>   panfrost_job.o \
>   panfrost_mmu.o \

I'm not sure you actually deleted gem_shrinker anywhere in this patch?
Diff stat is too small.

Re: [PATCH] drm/panfrost: Dynamically allocate pm_domains

2022-02-15 Thread Alyssa Rosenzweig

> I'd do the oneliner changing it to 5 and be done with it. That being
> said, we have plenty of examples of doing this both ways, so whatever
> makes people happy.

Excellent, that's the patch I wrote originally :-)

Dropping this patch, unless Angelo (or someone else) strongly objects.

Re: [PATCH] drm/panfrost: Dynamically allocate pm_domains

2022-02-14 Thread Alyssa Rosenzweig

mali_kbase hardcodes MAX_PM_DOMAINS (=5 for the mt8192 kernel). I have
no real objection to it but Angelo did. Maybe should've marked this RFC.

On Mon, Feb 14, 2022 at 03:31:32PM -0500, Alyssa Rosenzweig wrote:
> MT8192 requires 5 power domains. Rather than bump MAX_PM_DOMAINS and
> waste memory on every supported Panfrost chip, instead dynamically
> allocate pm_domain_devs and pm_domain_links. This adds some flexibility;
> it seems inevitable a new MediaTek device will require more than 5
> domains.
> 
> On non-MediaTek devices, this saves a small amount of memory.
> 
> Suggested-by: AngeloGioacchino Del Regno 
> 
> Signed-off-by: Alyssa Rosenzweig 
> ---
>  drivers/gpu/drm/panfrost/panfrost_device.c | 14 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h |  5 ++---
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
> b/drivers/gpu/drm/panfrost/panfrost_device.c
> index ee612303f076..661cdec320af 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -127,7 +127,10 @@ static void panfrost_pm_domain_fini(struct 
> panfrost_device *pfdev)
>  {
>   int i;
>  
> - for (i = 0; i < ARRAY_SIZE(pfdev->pm_domain_devs); i++) {
> + if (!pfdev->pm_domain_devs || !pfdev->pm_domain_links)
> + return;
> +
> + for (i = 0; i < pfdev->comp->num_pm_domains; i++) {
>   if (!pfdev->pm_domain_devs[i])
>   break;
>  
> @@ -161,9 +164,12 @@ static int panfrost_pm_domain_init(struct 
> panfrost_device *pfdev)
>   return -EINVAL;
>   }
>  
> - if (WARN(num_domains > ARRAY_SIZE(pfdev->pm_domain_devs),
> - "Too many supplies in compatible structure.\n"))
> - return -EINVAL;
> + pfdev->pm_domain_devs = devm_kcalloc(pfdev->dev, num_domains,
> +  sizeof(*pfdev->pm_domain_devs),
> +  GFP_KERNEL);
> + pfdev->pm_domain_links = devm_kcalloc(pfdev->dev, num_domains,
> +   sizeof(*pfdev->pm_domain_links),
> +   GFP_KERNEL);
>  
>   for (i = 0; i < num_domains; i++) {
>   pfdev->pm_domain_devs[i] =
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
> b/drivers/gpu/drm/panfrost/panfrost_device.h
> index 8b25278f34c8..98e3039696f9 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -22,7 +22,6 @@ struct panfrost_job;
>  struct panfrost_perfcnt;
>  
>  #define NUM_JOB_SLOTS 3
> -#define MAX_PM_DOMAINS 3
>  
>  struct panfrost_features {
>   u16 id;
> @@ -87,8 +86,8 @@ struct panfrost_device {
>   struct regulator_bulk_data *regulators;
>   struct reset_control *rstc;
>   /* pm_domains for devices with more than one. */
> - struct device *pm_domain_devs[MAX_PM_DOMAINS];
> - struct device_link *pm_domain_links[MAX_PM_DOMAINS];
> + struct device **pm_domain_devs;
> + struct device_link **pm_domain_links;
>   bool coherent;
>  
>   struct panfrost_features features;
> -- 
> 2.34.1
>

[PATCH] drm/panfrost: Dynamically allocate pm_domains

2022-02-14 Thread Alyssa Rosenzweig

MT8192 requires 5 power domains. Rather than bump MAX_PM_DOMAINS and
waste memory on every supported Panfrost chip, instead dynamically
allocate pm_domain_devs and pm_domain_links. This adds some flexibility;
it seems inevitable a new MediaTek device will require more than 5
domains.

On non-MediaTek devices, this saves a small amount of memory.

Suggested-by: AngeloGioacchino Del Regno 

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 14 ++
 drivers/gpu/drm/panfrost/panfrost_device.h |  5 ++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index ee612303f076..661cdec320af 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -127,7 +127,10 @@ static void panfrost_pm_domain_fini(struct panfrost_device 
*pfdev)
 {
int i;
 
-   for (i = 0; i < ARRAY_SIZE(pfdev->pm_domain_devs); i++) {
+   if (!pfdev->pm_domain_devs || !pfdev->pm_domain_links)
+   return;
+
+   for (i = 0; i < pfdev->comp->num_pm_domains; i++) {
if (!pfdev->pm_domain_devs[i])
break;
 
@@ -161,9 +164,12 @@ static int panfrost_pm_domain_init(struct panfrost_device 
*pfdev)
return -EINVAL;
}
 
-   if (WARN(num_domains > ARRAY_SIZE(pfdev->pm_domain_devs),
-   "Too many supplies in compatible structure.\n"))
-   return -EINVAL;
+   pfdev->pm_domain_devs = devm_kcalloc(pfdev->dev, num_domains,
+sizeof(*pfdev->pm_domain_devs),
+GFP_KERNEL);
+   pfdev->pm_domain_links = devm_kcalloc(pfdev->dev, num_domains,
+ sizeof(*pfdev->pm_domain_links),
+ GFP_KERNEL);
 
for (i = 0; i < num_domains; i++) {
pfdev->pm_domain_devs[i] =
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..98e3039696f9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -22,7 +22,6 @@ struct panfrost_job;
 struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
-#define MAX_PM_DOMAINS 3
 
 struct panfrost_features {
u16 id;
@@ -87,8 +86,8 @@ struct panfrost_device {
struct regulator_bulk_data *regulators;
struct reset_control *rstc;
/* pm_domains for devices with more than one. */
-   struct device *pm_domain_devs[MAX_PM_DOMAINS];
-   struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+   struct device **pm_domain_devs;
+   struct device_link **pm_domain_links;
bool coherent;
 
struct panfrost_features features;
-- 
2.34.1

Re: [PATCH 5/9] drm/panfrost: Add HW_ISSUE_TTRX_3485 quirk

2022-02-14 Thread Alyssa Rosenzweig

> > TTRX_3485 requires the infamous "dummy job" workaround. I have this
> > workaround implemented in a local branch, but I have not yet hit a case
> > that requires it so I cannot test whether the implementation is correct.
> > In the mean time, add the quirk bit so we can document which platforms
> > may need it in the future.
> 
> This one is hideous ;) Although to me this isn't the 'infamous' one as
> it's not the earliest example of a dummy job.

Terrifying. I guess we narrowly avoided the 'replay' workaround which
was far worse than this one...

> However... I believe as Panfrost currently stands this is probably not
> very possible to hit. It requires a job to be stopped (soft or hard) at
> a critical point during submission - which at the moment Panfrost
> basically never does (the exception is if you close the fd immediately
> while a job is in progress). And of course the timing has to be 'just
> right' to hit the bug.

OK, that's good to know. Still "should" be fixed but that definitely
lowers the priority of it. Frankly the multithreading bugs we have on
the CPU side would hang the machine sooner...

Re: [PATCH 4/9] drm/panfrost: Handle HW_ISSUE_TTRX_3076

2022-02-14 Thread Alyssa Rosenzweig

On Mon, Feb 14, 2022 at 04:23:18PM +, Steven Price wrote:
> On 11/02/2022 20:27, alyssa.rosenzw...@collabora.com wrote:
> > From: Alyssa Rosenzweig 
> > 
> > Some Valhall GPUs require resets when encountering bus faults due to
> > occlusion query writes. Add the issue bit for this and handle it.
> > 
> > Signed-off-by: Alyssa Rosenzweig 
> 
> Reviewed-by: Steven Price 
> (although one nit below)
> 
> Just in case any one is wondering - these bus faults occur when
> switching the GPU's MMU to unmapped - it's not a normal "bus fault" from
> the external bus. This is triggered by an attempt to read unmapped
> memory which is completed by the driver by switching the entire MMU to
> unmapped.

Ouch, that's subtle.

> > diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
> > b/drivers/gpu/drm/panfrost/panfrost_issues.h
> > index a66692663833..058f6a4c8435 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_issues.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
> > @@ -128,6 +128,10 @@ enum panfrost_hw_issue {
> > /* Must set SC_VAR_ALGORITHM */
> > HW_ISSUE_TTRX_2968_TTRX_3162,
> >  
> > +   /* Bus fault from occlusion query write may cause future fragment jobs
> > +* to hang */
> 
> NIT: Kernel comment style has the "/*" and "*/" on lines by themselves
> for multi-line comments. checkpatch will complain!

Yes, I am aware (and checkpatch did complain). The existing multi-line
comments in that file do not have the extra lines. Consistency within
the file seemed like the lesser evil. If you think it's better to
appease checkpatch, I can reformat for v2.

(I can also throw in a patch fixing the rest of that file's multiline
comments but that seems a bit extra.)

Re: [PATCH 8/9] drm/panfrost: Add Mali-G57 "Natt" support

2022-02-14 Thread Alyssa Rosenzweig

> > index b8865fc9efce..1a0dc7f7f857 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_issues.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
> > @@ -258,6 +258,11 @@ enum panfrost_hw_issue {
> >  
> >  #define hw_issues_g76 0
> >  
> > +#define hw_issues_g57 (\
> > +   BIT_ULL(HW_ISSUE_TTRX_2968_TTRX_3162) | \
> > +   BIT_ULL(HW_ISSUE_TTRX_3076) | \
> > +   BIT_ULL(HW_ISSUE_TTRX_3485))
> 
> Do you know whether you have an r0p0 or an r0p1 Natt? Only the r0p0 has
> the 3485 issue, and we might be lucky and it's the r0p1 that's "in the
> wild".

Sadly, I believe I have an r0p0. I don't know if future spins of the
same SoC would be bumped up, but I'm skeptical.

> It would be good to annotate these lists with the hardware revisions
> when there is a difference.

Sure.

Re: [PATCH 6/9] drm/panfrost: Add "clean only safe" feature bit

2022-02-14 Thread Alyssa Rosenzweig

> > Add the HW_FEATURE_CLEAN_ONLY_SAFE bit based on kbase. When I actually
> > tried to port the logic from kbase, trivial jobs raised Data Invalid
> > Faults, so this may depend on other coherency details. It's still useful
> > to have the bit to record the feature bit when adding new models.
> > 
> > Signed-off-by: Alyssa Rosenzweig 
> 
> Reviewed-by: Steven Price 
> 
> Sadly I don't have the hardware to try this out on, but it should be a
> simple case of the below (untested):
> 
> 8<
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
> b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 908d79520853..602e51c4966e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -212,9 +212,13 @@ static void panfrost_job_hw_submit(struct panfrost_job 
> *job, int js)
>  * start */
> cfg |= JS_CONFIG_THREAD_PRI(8) |
> JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE |
> -   JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE |
> panfrost_get_job_chain_flag(job);
>  
> +   if (panfrost_has_hw_feature(pfdev, HW_FEATURE_CLEAN_ONLY_SAFE))
> +   cfg |= JS_CONFIG_END_FLUSH_CLEAN;
> +   else
> +   cfg |= JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
> +
> if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
> cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;

Yes, this is the patch I typed out... causes DATA_INVALID_FAULTs for me
with Mesa. Which makes me wonder if userspace needs to respect some
extra rules for this to be safe.

Re: [PATCH 1/9] dt-bindings: Add arm,mali-valhall compatible

2022-02-14 Thread Alyssa Rosenzweig

> > diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
> > b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> > index 63a08f3f321d..48aeabd2ed68 100644
> > --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> > +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> > @@ -23,6 +23,7 @@ properties:
> >- rockchip,px30-mali
> >- rockchip,rk3568-mali
> >- const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
> > discoverable
> > +  - const: arm,mali-valhall # Mali Valhall GPU model/revision is fully 
> > discoverable
> 
> It might be worth spelling out here that this is *pre-CSF* Valhall. I'm
> pretty sure we're going to need different bindings for CSF GPUs.

Yes, agreed, will make a note for v2.

[PATCH 9/9] drm/panfrost: Handle arm,mali-valhall compatible

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

The most important Valhall-specific quirks have been handled, so add the
Valhall compatible and probe.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 96bb5a465627..12977454af75 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -663,6 +663,7 @@ static const struct of_device_id dt_match[] = {
{ .compatible = "arm,mali-t860", .data = &default_data, },
{ .compatible = "arm,mali-t880", .data = &default_data, },
{ .compatible = "arm,mali-bifrost", .data = &default_data, },
+   { .compatible = "arm,mali-valhall", .data = &default_data, },
{ .compatible = "mediatek,mt8183-mali", .data = &mediatek_mt8183_data },
{}
 };
-- 
2.34.1

[PATCH 8/9] drm/panfrost: Add Mali-G57 "Natt" support

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

Add the features, issues, and GPU ID for Mali-G57, a first-generation
Valhall GPU. Other first- and second-generation Valhall GPUs should be
similar.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 12 
 drivers/gpu/drm/panfrost/panfrost_gpu.c  |  2 ++
 drivers/gpu/drm/panfrost/panfrost_issues.h   |  5 +
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 1a8bdebc86a3..7ed0cd3ea2d4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -106,6 +106,18 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_TLS_HASHING) | \
BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
 
+#define hw_features_g57 (\
+   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+   BIT_ULL(HW_FEATURE_XAFFINITY) | \
+   BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+   BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+   BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+   BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
+   BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
+   BIT_ULL(HW_FEATURE_CLEAN_ONLY_SAFE))
+
 static inline bool panfrost_has_hw_feature(struct panfrost_device *pfdev,
   enum panfrost_hw_feature feat)
 {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 73e5774f01c1..08d657527099 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -201,6 +201,8 @@ static const struct panfrost_model gpu_models[] = {
GPU_MODEL(g52, 0x7002),
GPU_MODEL(g31, 0x7003,
GPU_REV(g31, 1, 0)),
+
+   GPU_MODEL(g57, 0x9001),
 };
 
 static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index b8865fc9efce..1a0dc7f7f857 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -258,6 +258,11 @@ enum panfrost_hw_issue {
 
 #define hw_issues_g76 0
 
+#define hw_issues_g57 (\
+   BIT_ULL(HW_ISSUE_TTRX_2968_TTRX_3162) | \
+   BIT_ULL(HW_ISSUE_TTRX_3076) | \
+   BIT_ULL(HW_ISSUE_TTRX_3485))
+
 static inline bool panfrost_has_hw_issue(const struct panfrost_device *pfdev,
 enum panfrost_hw_issue issue)
 {
-- 
2.34.1

[PATCH 6/9] drm/panfrost: Add "clean only safe" feature bit

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

Add the HW_FEATURE_CLEAN_ONLY_SAFE bit based on kbase. When I actually
tried to port the logic from kbase, trivial jobs raised Data Invalid
Faults, so this may depend on other coherency details. It's still useful
to have the bit to record the feature bit when adding new models.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 36fadcf9634e..1a8bdebc86a3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -21,6 +21,7 @@ enum panfrost_hw_feature {
HW_FEATURE_TLS_HASHING,
HW_FEATURE_THREAD_GROUP_SPLIT,
HW_FEATURE_IDVS_GROUP_SIZE,
+   HW_FEATURE_CLEAN_ONLY_SAFE,
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
 };
 
-- 
2.34.1

[PATCH 7/9] drm/panfrost: Don't set L2_MMU_CONFIG quirks

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

L2_MMU_CONFIG is an implementation-defined register. Different Mali GPUs
define slightly different MAX_READS and MAX_WRITES fields, which
throttle outstanding reads and writes when set to non-zero values. When
left as zero, reads and writes are not throttled.

Both kbase and panfrost always zero these registers. Per discussion with
Steven Price, there are two reasons these quirks may be used:

1. Simulating slower memory subsystems. This use case is only of
   interest to system-on-chip designers; it is not relevant to mainline.

2. Working around broken memory subsystems. Hopefully we never see this
   case in mainline. If we do, we'll need to set this register based on
   an SoC-compatible, rather than generally matching on the GPU model.

To the best of our knowledge, these fields are zero at reset, so the
write is not necessary. Let's remove the write to aid porting to new
Mali GPUs, which have different layouts for the L2_MMU_CONFIG register.

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 1c1e2017aa80..73e5774f01c1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -127,18 +127,6 @@ static void panfrost_gpu_init_quirks(struct 
panfrost_device *pfdev)
gpu_write(pfdev, GPU_TILER_CONFIG, quirks);
 
 
-   quirks = gpu_read(pfdev, GPU_L2_MMU_CONFIG);
-
-   /* Limit read & write ID width for AXI */
-   if (panfrost_has_hw_feature(pfdev, 
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
-   quirks &= ~(L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_READS |
-   L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_WRITES);
-   else
-   quirks &= ~(L2_MMU_CONFIG_LIMIT_EXTERNAL_READS |
-   L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES);
-
-   gpu_write(pfdev, GPU_L2_MMU_CONFIG, quirks);
-
quirks = 0;
if ((panfrost_model_eq(pfdev, 0x860) || panfrost_model_eq(pfdev, 
0x880)) &&
pfdev->features.revision >= 0x2000)
-- 
2.34.1

[PATCH 2/9] drm/panfrost: Handle HW_ISSUE_TTRX_2968_TTRX_3162

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

Add handling for the HW_ISSUE_TTRX_2968_TTRX_3162 quirk. Logic ported
from kbase. kbase lists this workaround as used on Mali-G57.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c| 3 +++
 drivers/gpu/drm/panfrost/panfrost_issues.h | 3 +++
 drivers/gpu/drm/panfrost/panfrost_regs.h   | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 50c8922694d7..1c1e2017aa80 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -108,6 +108,9 @@ static void panfrost_gpu_init_quirks(struct panfrost_device 
*pfdev)
quirks |= SC_LS_ALLOW_ATTR_TYPES;
}
 
+   if (panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_2968_TTRX_3162))
+   quirks |= SC_VAR_ALGORITHM;
+
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_TLS_HASHING))
quirks |= SC_TLS_HASH_ENABLE;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 8e59d765bf19..3af7d723377e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -125,6 +125,9 @@ enum panfrost_hw_issue {
 * kernel must fiddle with L2 caches to prevent data leakage */
HW_ISSUE_TGOX_R1_1234,
 
+   /* Must set SC_VAR_ALGORITHM */
+   HW_ISSUE_TTRX_2968_TTRX_3162,
+
HW_ISSUE_END
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 16e776cc82ea..fa1e1af56e17 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -195,6 +195,7 @@
 #define SC_TLS_HASH_ENABLE BIT(17)
 #define SC_LS_ATTR_CHECK_DISABLE   BIT(18)
 #define SC_ENABLE_TEXGRD_FLAGS BIT(25)
+#define SC_VAR_ALGORITHM   BIT(29)
 /* End SHADER_CONFIG register */
 
 /* TILER_CONFIG register */
-- 
2.34.1

[PATCH 5/9] drm/panfrost: Add HW_ISSUE_TTRX_3485 quirk

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

TTRX_3485 requires the infamous "dummy job" workaround. I have this
workaround implemented in a local branch, but I have not yet hit a case
that requires it so I cannot test whether the implementation is correct.
In the mean time, add the quirk bit so we can document which platforms
may need it in the future.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_issues.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 058f6a4c8435..b8865fc9efce 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -132,6 +132,9 @@ enum panfrost_hw_issue {
 * to hang */
HW_ISSUE_TTRX_3076,
 
+   /* Must issue a dummy job before starting real work to prevent hangs */
+   HW_ISSUE_TTRX_3485,
+
HW_ISSUE_END
 };
 
-- 
2.34.1

[PATCH 4/9] drm/panfrost: Handle HW_ISSUE_TTRX_3076

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

Some Valhall GPUs require resets when encountering bus faults due to
occlusion query writes. Add the issue bit for this and handle it.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 9 +++--
 drivers/gpu/drm/panfrost/panfrost_issues.h | 4 
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index 7f51a4682ccb..ee612303f076 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -11,6 +11,7 @@
 #include "panfrost_device.h"
 #include "panfrost_devfreq.h"
 #include "panfrost_features.h"
+#include "panfrost_issues.h"
 #include "panfrost_gpu.h"
 #include "panfrost_job.h"
 #include "panfrost_mmu.h"
@@ -380,9 +381,13 @@ const char *panfrost_exception_name(u32 exception_code)
 bool panfrost_exception_needs_reset(const struct panfrost_device *pfdev,
u32 exception_code)
 {
-   /* Right now, none of the GPU we support need a reset, but this
-* might change.
+   /* If an occlusion query write causes a bus fault on affected GPUs,
+* future fragment jobs may hang. Reset to workaround.
 */
+   if (exception_code == DRM_PANFROST_EXCEPTION_JOB_BUS_FAULT)
+   return panfrost_has_hw_issue(pfdev, HW_ISSUE_TTRX_3076);
+
+   /* No other GPUs we support need a reset */
return false;
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index a66692663833..058f6a4c8435 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -128,6 +128,10 @@ enum panfrost_hw_issue {
/* Must set SC_VAR_ALGORITHM */
HW_ISSUE_TTRX_2968_TTRX_3162,
 
+   /* Bus fault from occlusion query write may cause future fragment jobs
+* to hang */
+   HW_ISSUE_TTRX_3076,
+
HW_ISSUE_END
 };
 
-- 
2.34.1

[PATCH 3/9] drm/panfrost: Constify argument to has_hw_issue

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

Logically, this function is free of side effects, so any pointers it
takes should be const. Needed to avoid a warning in the next patch.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_issues.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 3af7d723377e..a66692663833 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -251,7 +251,7 @@ enum panfrost_hw_issue {
 
 #define hw_issues_g76 0
 
-static inline bool panfrost_has_hw_issue(struct panfrost_device *pfdev,
+static inline bool panfrost_has_hw_issue(const struct panfrost_device *pfdev,
 enum panfrost_hw_issue issue)
 {
return test_bit(issue, pfdev->features.hw_issues);
-- 
2.34.1

[PATCH 1/9] dt-bindings: Add arm,mali-valhall compatible

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

>From the kernel's perspective, pre-CSF Valhall is more or less
compatible with Bifrost, although they differ to userspace. Add a
compatible for Valhall to the existing Bifrost bindings documentation.

Signed-off-by: Alyssa Rosenzweig 
Cc: devicet...@vger.kernel.org
---
 Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
index 63a08f3f321d..48aeabd2ed68 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
@@ -23,6 +23,7 @@ properties:
   - rockchip,px30-mali
   - rockchip,rk3568-mali
   - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
discoverable
+  - const: arm,mali-valhall # Mali Valhall GPU model/revision is fully 
discoverable
 
   reg:
 maxItems: 1
-- 
2.34.1

[PATCH 0/9] drm/panfrost: Initial Valhall support

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

This patch series adds preliminary support for Mali "Valhall" GPUs into
the Panfrost kernel driver. The series has been tested on the Mali-G57
on a MediaTek MT8192 system. However, that system requires additional
MediaTek-specific patches [1] as well as core mainlining for MediaTek.
I'll post the MT8192-specific Panfrost patches soon; they depend on this
core series.

On the userspace side, pre-CSF Valhall (what is supported here) uses an
identical UABI as Bifrost. Mesa support for Valhall is being worked on
in parallel [2]. I'm hoping basic support for Valhall will be available
in Mesa 22.1.

[1] https://gitlab.freedesktop.org/panfrost/linux/-/tree/mt8192-branch
[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14795

Alyssa Rosenzweig (9):
  dt-bindings: Add arm,mali-valhall compatible
  drm/panfrost: Handle HW_ISSUE_TTRX_2968_TTRX_3162
  drm/panfrost: Constify argument to has_hw_issue
  drm/panfrost: Handle HW_ISSUE_TTRX_3076
  drm/panfrost: Add HW_ISSUE_TTRX_3485 quirk
  drm/panfrost: Add "clean only safe" feature bit
  drm/panfrost: Don't set L2_MMU_CONFIG quirks
  drm/panfrost: Add Mali-G57 "Natt" support
  drm/panfrost: Handle arm,mali-valhall compatible

 .../bindings/gpu/arm,mali-bifrost.yaml  |  1 +
 drivers/gpu/drm/panfrost/panfrost_device.c  |  9 +++--
 drivers/gpu/drm/panfrost/panfrost_drv.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_features.h| 13 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 17 +
 drivers/gpu/drm/panfrost/panfrost_issues.h  | 17 -
 drivers/gpu/drm/panfrost/panfrost_regs.h|  1 +
 7 files changed, 44 insertions(+), 15 deletions(-)

-- 
2.34.1

[PATCH v2] drm/panfrost: Handle IDVS_GROUP_SIZE feature

2022-02-11 Thread alyssa . rosenzweig

From: Alyssa Rosenzweig 

The IDVS group size feature was missing. It is used on some Bifrost and
Valhall GPUs, and is the last kernel-relevant Bifrost feature we're
missing.

This feature adds an extra IDVS group size field to the JM_CONFIG
register. In kbase, the value is configurable via the device tree; kbase
uses 0xF as a default if no value is specified. Until we find a device
demanding otherwise, let's always set the 0xF default on devices which
support this feature mimicking kbase's behaviour.

Tuning this register slightly improves performance of index-driven vertex
shading. On Mali-G52 (with Mesa), overall glmark2 score is improved from 1026 to
1037. Geometry-heavy scenes like -bshading are improved from 1068 to 1098.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 3 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.c  | 3 +++
 drivers/gpu/drm/panfrost/panfrost_regs.h | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 34f2bae1ec8c..36fadcf9634e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -20,6 +20,7 @@ enum panfrost_hw_feature {
HW_FEATURE_AARCH64_MMU,
HW_FEATURE_TLS_HASHING,
HW_FEATURE_THREAD_GROUP_SPLIT,
+   HW_FEATURE_IDVS_GROUP_SIZE,
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
 };
 
@@ -74,6 +75,7 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
BIT_ULL(HW_FEATURE_COHERENCY_REG))
 
 #define hw_features_g76 (\
@@ -87,6 +89,7 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
BIT_ULL(HW_FEATURE_TLS_HASHING) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
 
 #define hw_features_g31 (\
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index bbe628b306ee..50c8922694d7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -145,6 +145,9 @@ static void panfrost_gpu_init_quirks(struct panfrost_device 
*pfdev)
quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
   JM_FORCE_COHERENCY_FEATURES_SHIFT;
 
+   if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
+   quirks |= JM_DEFAULT_IDVS_GROUP_SIZE << 
JM_IDVS_GROUP_SIZE_SHIFT;
+
if (quirks)
gpu_write(pfdev, GPU_JM_CONFIG, quirks);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 6c5a11ef1ee8..16e776cc82ea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -208,6 +208,7 @@
 #define JM_MAX_JOB_THROTTLE_LIMIT  0x3F
 #define JM_FORCE_COHERENCY_FEATURES_SHIFT 2
 #define JM_IDVS_GROUP_SIZE_SHIFT   16
+#define JM_DEFAULT_IDVS_GROUP_SIZE 0xF
 #define JM_MAX_IDVS_GROUP_SIZE 0x3F
 
 
-- 
2.34.1

Re: [PATCH 2/2] drm/panfrost: Merge some feature lists

2022-01-13 Thread Alyssa Rosenzweig

> >>> Note that this leaves some unmerged identical Bifrost feature lists, as
> >>> there are more features affecting Bifrost kernel space that we do not
> >>> yet hanlde.
> >>
> >> NIT: s/hanlde/handle/ ;)
> >>
> >> Do you have any features in mind that we're missing? The list looks very
> >> similar to the kbase one. And anyway it is simple enough to split again
> >> if we need to.
> > 
> > Just IDVS group size. For some reason I thought there were more when I
> > wrote that commit message. It's split to avoid churn in that patch.
> > 
> > Logically, this series should contain three patches, with the IDVS group
> > size enablement patch at the end. That was the series I wrote and
> > committed to disk. For review I split it out, since the feature clean-up
> > can land now, while the (RFC) IDVS group size patch needs
> > testing/benchmarking.
> > 
> 
> Ah, of course! That makes perfect sense, but somehow I hadn't managed to
> connect the two.
> 
> I've fixed the typo and pushed to drm-misc-next. And I'll wait for your
> benchmarking on IDVS. Do I get a few minutes break before the Valhall
> patches need reviewing? ;)

Thanks for the push :-) And yes, I'd like to get Valhall userspace up to
shape before trying to shovel code into the kernel ^^ There are some
errata that kbase works around that I haven't implemented workarounds
for yet, and I'd like to figure out how to hit those so I can test that
the workarounds are correct. (Particularly thinking of the dummy job
workaround / GPU hang issue)

Re: [PATCH 2/2] drm/panfrost: Merge some feature lists

2022-01-12 Thread Alyssa Rosenzweig

> > Now that we only list features of interest to kernel space, lots of GPUs
> > have the same feature bits. To cut down on the repetition in the file,
> > merge feature lists that are identical between similar GPUs.
> > 
> > Note that this leaves some unmerged identical Bifrost feature lists, as
> > there are more features affecting Bifrost kernel space that we do not
> > yet hanlde.
> 
> NIT: s/hanlde/handle/ ;)
> 
> Do you have any features in mind that we're missing? The list looks very
> similar to the kbase one. And anyway it is simple enough to split again
> if we need to.

Just IDVS group size. For some reason I thought there were more when I
wrote that commit message. It's split to avoid churn in that patch.

Logically, this series should contain three patches, with the IDVS group
size enablement patch at the end. That was the series I wrote and
committed to disk. For review I split it out, since the feature clean-up
can land now, while the (RFC) IDVS group size patch needs
testing/benchmarking.

Re: [PATCH 1/2] drm/panfrost: Remove features meant for userspace

2022-01-12 Thread Alyssa Rosenzweig

> (although it's a good thing kbase never did this cleanup - it's a useful
> source of public information ;) )

Haha, yes. Actually, kbase did do the clean up recently (Valhall era
kbase, I guess). To be fair, I still don't know what some of these were,
like "T7xx pairing rules"... Presumably something dreadfully
Midgard-compiler specific.

Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs

2022-01-10 Thread Alyssa Rosenzweig

> Whether it's worth the effort depends on whether anyone really cares
> about getting the full performance out of this particular GPU.
> 
> At this stage I think the main UABI change would be to add the opposite
> flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
> opt-in to allowing the job to run across all cores.
> 
> The second change would be to allow compute jobs to be run on the second
> core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
> 
> But clearly there's little point adding such flags until someone steps
> up to do the Mesa work.

I worry about the maintainence burden (both Mesa and kernel) of adding
UABI only used by a piece of hardware none of us own, and only useful
"sometimes" for that hardware. Doubly so for the second core group
support; currently Mesa doesn't advertise any compute support on
anything older than Mali T760 ... to the best of my knowledge, nobody
has missed that support either...

To be clear I am in favour of merging the patches needed for GLES2 to
work on all Malis, possibly at a performance cost on these dual-core
systems. That's a far cry from the level of support the DDK gave these
chips back in the day ... of course, the DDK doesn't support them at all
anymore, so Panfrost wins there by default! ;)

Re: [RFC PATCH] drm/panfrost: Handle IDVS_GROUP_SIZE feature

2022-01-10 Thread Alyssa Rosenzweig

> > This feature adds an extra IDVS group size field to the JM_CONFIG
> > register. In kbase, the value is configurable via the device tree; kbase
> > uses 0xF as a default if no value is specified. Until we find a device
> > demanding otherwise, let's always set the 0xF default on devices which
> > support this feature mimicking kbase's behaviour.
> 
> This is a performance thing - so I don't think it will break anything if
> this is wrong, it just won't be optimal.

Then interpret my remarks as hardcoding the default until we find a
device where setting to something other than 0xF improves performance
nontrivially. (Read: I am lazy and do not want to write dt-bindings for
something nobody will ever use.)

> > As JM_CONFIG is an undocumented register, it's not clear to me what
> > happens if we fail to include this handling. Index-driven vertex shading
> > already works on Bifrost boards with this feature without this handling.
> > Perhaps this has performance implications? Patch untested for the
> > moment, wanted to give Steven a chance to comment.
> 
> As it's a performance thing you shouldn't see correctness issues with
> not setting it. But 0xF seems to have been chosen as it gave the best
> overall performance (although for individual test content this can
> vary). AFAICT the performance impact isn't massive either.

Good to know, will update the commit message accordingly.

> Reviewed-by: Steven Price 
> 
> Since you've tagged this RFC I won't merge it now, but it looks correct
> to me.

Thanks for the review... I hope you like reviewing Panfrost patches
because I have a Valhall bring-up series waiting o:)

When I get a chance to uprev the kernel on my G52 board I'll see if I
can benchmark the impact of this change, so far this is only
compile-tested. Even if there's no impact the patch should likely go in
to stay consistent with kbase, but hopefully there's a win from this. At
that point I'll send a v2 with your reviewed-by (and hopefully no
changes other than the commit message) and we'll land that.

Re: [RFC PATCH] drm/panfrost: Handle IDVS_GROUP_SIZE feature

2022-01-09 Thread Alyssa Rosenzweig

kbase dt-bindings say that tasks are sent to cores in groups of N + 1,
where N is the value here. So our old behaviour sends tasks in groups of
1; the new behaviour sends tasks in groups of 16. I assume this has
performance implications but no conformance implications.

Searching GitHub, I can't find any device trees that set
idvs-group-size out of the many random Android forks people have
uploaded, so I don't think this will matter for any production device.
(Was this a workaround for preproduction silicon? or FPGAs? or was this
an option for the sake of having an option?)

On Sun, Jan 09, 2022 at 12:12:54PM -0500, Alyssa Rosenzweig wrote:
> The IDVS group size feature was missing. It is used on some Bifrost and
> Valhall GPUs, and is the last kernel-relevant Bifrost feature we're
> missing.
> 
> This feature adds an extra IDVS group size field to the JM_CONFIG
> register. In kbase, the value is configurable via the device tree; kbase
> uses 0xF as a default if no value is specified. Until we find a device
> demanding otherwise, let's always set the 0xF default on devices which
> support this feature mimicking kbase's behaviour.
> 
> As JM_CONFIG is an undocumented register, it's not clear to me what
> happens if we fail to include this handling. Index-driven vertex shading
> already works on Bifrost boards with this feature without this handling.
> Perhaps this has performance implications? Patch untested for the
> moment, wanted to give Steven a chance to comment.
> 
> Applies on top of my feature clean up series which should go in first.
> (That's pure cleaunp, this is a behaviour change RFC needing
> discussion.)
> 
> Signed-off-by: Alyssa Rosenzweig 
> ---
>  drivers/gpu/drm/panfrost/panfrost_features.h | 3 +++
>  drivers/gpu/drm/panfrost/panfrost_gpu.c  | 3 +++
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 1 +
>  3 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
> b/drivers/gpu/drm/panfrost/panfrost_features.h
> index 34f2bae1ec8c..36fadcf9634e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_features.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_features.h
> @@ -20,6 +20,7 @@ enum panfrost_hw_feature {
>   HW_FEATURE_AARCH64_MMU,
>   HW_FEATURE_TLS_HASHING,
>   HW_FEATURE_THREAD_GROUP_SPLIT,
> + HW_FEATURE_IDVS_GROUP_SIZE,
>   HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
>  };
>  
> @@ -74,6 +75,7 @@ enum panfrost_hw_feature {
>   BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
>   BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
>   BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
> + BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
>   BIT_ULL(HW_FEATURE_COHERENCY_REG))
>  
>  #define hw_features_g76 (\
> @@ -87,6 +89,7 @@ enum panfrost_hw_feature {
>   BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
>   BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
>   BIT_ULL(HW_FEATURE_TLS_HASHING) | \
> + BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
>   BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
>  
>  #define hw_features_g31 (\
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
> b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index bbe628b306ee..50c8922694d7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -145,6 +145,9 @@ static void panfrost_gpu_init_quirks(struct 
> panfrost_device *pfdev)
>   quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
>  JM_FORCE_COHERENCY_FEATURES_SHIFT;
>  
> + if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
> + quirks |= JM_DEFAULT_IDVS_GROUP_SIZE << 
> JM_IDVS_GROUP_SIZE_SHIFT;
> +
>   if (quirks)
>   gpu_write(pfdev, GPU_JM_CONFIG, quirks);
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
> b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 6c5a11ef1ee8..16e776cc82ea 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -208,6 +208,7 @@
>  #define JM_MAX_JOB_THROTTLE_LIMIT0x3F
>  #define JM_FORCE_COHERENCY_FEATURES_SHIFT 2
>  #define JM_IDVS_GROUP_SIZE_SHIFT 16
> +#define JM_DEFAULT_IDVS_GROUP_SIZE   0xF
>  #define JM_MAX_IDVS_GROUP_SIZE   0x3F
>  
>  
> -- 
> 2.34.1
>

[RFC PATCH] drm/panfrost: Handle IDVS_GROUP_SIZE feature

2022-01-09 Thread Alyssa Rosenzweig

The IDVS group size feature was missing. It is used on some Bifrost and
Valhall GPUs, and is the last kernel-relevant Bifrost feature we're
missing.

This feature adds an extra IDVS group size field to the JM_CONFIG
register. In kbase, the value is configurable via the device tree; kbase
uses 0xF as a default if no value is specified. Until we find a device
demanding otherwise, let's always set the 0xF default on devices which
support this feature mimicking kbase's behaviour.

As JM_CONFIG is an undocumented register, it's not clear to me what
happens if we fail to include this handling. Index-driven vertex shading
already works on Bifrost boards with this feature without this handling.
Perhaps this has performance implications? Patch untested for the
moment, wanted to give Steven a chance to comment.

Applies on top of my feature clean up series which should go in first.
(That's pure cleaunp, this is a behaviour change RFC needing
discussion.)

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 3 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.c  | 3 +++
 drivers/gpu/drm/panfrost/panfrost_regs.h | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 34f2bae1ec8c..36fadcf9634e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -20,6 +20,7 @@ enum panfrost_hw_feature {
HW_FEATURE_AARCH64_MMU,
HW_FEATURE_TLS_HASHING,
HW_FEATURE_THREAD_GROUP_SPLIT,
+   HW_FEATURE_IDVS_GROUP_SIZE,
HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
 };
 
@@ -74,6 +75,7 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
BIT_ULL(HW_FEATURE_COHERENCY_REG))
 
 #define hw_features_g76 (\
@@ -87,6 +89,7 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
BIT_ULL(HW_FEATURE_TLS_HASHING) | \
+   BIT_ULL(HW_FEATURE_IDVS_GROUP_SIZE) | \
BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
 
 #define hw_features_g31 (\
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index bbe628b306ee..50c8922694d7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -145,6 +145,9 @@ static void panfrost_gpu_init_quirks(struct panfrost_device 
*pfdev)
quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
   JM_FORCE_COHERENCY_FEATURES_SHIFT;
 
+   if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
+   quirks |= JM_DEFAULT_IDVS_GROUP_SIZE << 
JM_IDVS_GROUP_SIZE_SHIFT;
+
if (quirks)
gpu_write(pfdev, GPU_JM_CONFIG, quirks);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 6c5a11ef1ee8..16e776cc82ea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -208,6 +208,7 @@
 #define JM_MAX_JOB_THROTTLE_LIMIT  0x3F
 #define JM_FORCE_COHERENCY_FEATURES_SHIFT 2
 #define JM_IDVS_GROUP_SIZE_SHIFT   16
+#define JM_DEFAULT_IDVS_GROUP_SIZE 0xF
 #define JM_MAX_IDVS_GROUP_SIZE 0x3F
 
 
-- 
2.34.1

[PATCH 2/2] drm/panfrost: Merge some feature lists

2022-01-09 Thread Alyssa Rosenzweig

Now that we only list features of interest to kernel space, lots of GPUs
have the same feature bits. To cut down on the repetition in the file,
merge feature lists that are identical between similar GPUs.

Note that this leaves some unmerged identical Bifrost feature lists, as
there are more features affecting Bifrost kernel space that we do not
yet hanlde.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 40 
 1 file changed, 7 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index f557fad5d5ff..34f2bae1ec8c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -27,14 +27,9 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
BIT_ULL(HW_FEATURE_V4))
 
-#define hw_features_t620 (\
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
-   BIT_ULL(HW_FEATURE_V4))
-
-#define hw_features_t720 (\
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
-   BIT_ULL(HW_FEATURE_V4))
+#define hw_features_t620 hw_features_t600
 
+#define hw_features_t720 hw_features_t600
 
 #define hw_features_t760 (\
BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
@@ -42,26 +37,13 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_XAFFINITY) | \
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
 
-// T860
-#define hw_features_t860 (\
-   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
-   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
-   BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+#define hw_features_t860 hw_features_t760
 
-#define hw_features_t880 hw_features_t860
+#define hw_features_t880 hw_features_t760
 
-#define hw_features_t830 (\
-   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
-   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
-   BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+#define hw_features_t830 hw_features_t760
 
-#define hw_features_t820 (\
-   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
-   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
-   BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+#define hw_features_t820 hw_features_t760
 
 #define hw_features_g71 (\
BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
@@ -82,15 +64,7 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
BIT_ULL(HW_FEATURE_COHERENCY_REG))
 
-#define hw_features_g51 (\
-   BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
-   BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
-   BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
-   BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
-   BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
-   BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
-   BIT_ULL(HW_FEATURE_COHERENCY_REG))
+#define hw_features_g51 hw_features_g72
 
 #define hw_features_g52 (\
BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
-- 
2.34.1

[PATCH 1/2] drm/panfrost: Remove features meant for userspace

2022-01-09 Thread Alyssa Rosenzweig

Early versions of the legacy kernel driver included comprehensive
feature lists for every GPU, even though most of the enumerated features
only matter to userspace. For example, HW_FEATURE_INTERPIPE_REG_ALIASING
was a feature bit indicating that a GPU had "interpipe register
aliasing": arithmetic, load/store, and texture instruction all use
common general-purpose registers. GPUs without this feature bit have
dedicated load/store and texture "registers". Whether a GPU has this
feature or not is irrelevant to the kernel; it only matters in the
userspace compiler's register allocator. It's silly to enumerate it in
kernel space, and the information is understandably unused. To
underscore the point, this feature only makes sense in the context of
the Midgard instruction set. Bifrost never had dedicated load/store or
texture registers, so the feature bit was vacuously set for all Bifrost
hardware, even though this conveys no useful information.

To clean up the feature list, delete feature bits which could not
possibly matter to the kernel, leaving only those which do affect the
register-level operation of the chip.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/panfrost/panfrost_features.h | 172 ---
 1 file changed, 172 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h 
b/drivers/gpu/drm/panfrost/panfrost_features.h
index 5056777c7744..f557fad5d5ff 100644
--- a/drivers/gpu/drm/panfrost/panfrost_features.h
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -12,24 +12,6 @@ enum panfrost_hw_feature {
HW_FEATURE_JOBCHAIN_DISAMBIGUATION,
HW_FEATURE_PWRON_DURING_PWROFF_TRANS,
HW_FEATURE_XAFFINITY,
-   HW_FEATURE_OUT_OF_ORDER_EXEC,
-   HW_FEATURE_MRT,
-   HW_FEATURE_BRNDOUT_CC,
-   HW_FEATURE_INTERPIPE_REG_ALIASING,
-   HW_FEATURE_LD_ST_TILEBUFFER,
-   HW_FEATURE_MSAA_16X,
-   HW_FEATURE_32_BIT_UNIFORM_ADDRESS,
-   HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL,
-   HW_FEATURE_OPTIMIZED_COVERAGE_MASK,
-   HW_FEATURE_T7XX_PAIRING_RULES,
-   HW_FEATURE_LD_ST_LEA_TEX,
-   HW_FEATURE_LINEAR_FILTER_FLOAT,
-   HW_FEATURE_WORKGROUP_ROUND_MULTIPLE_OF_4,
-   HW_FEATURE_IMAGES_IN_FRAGMENT_SHADERS,
-   HW_FEATURE_TEST4_DATUM_MODE,
-   HW_FEATURE_NEXT_INSTRUCTION_TYPE,
-   HW_FEATURE_BRNDOUT_KILL,
-   HW_FEATURE_WARPING,
HW_FEATURE_V4,
HW_FEATURE_FLUSH_REDUCTION,
HW_FEATURE_PROTECTED_MODE,
@@ -42,27 +24,15 @@ enum panfrost_hw_feature {
 };
 
 #define hw_features_t600 (\
-   BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
-   BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
BIT_ULL(HW_FEATURE_V4))
 
 #define hw_features_t620 (\
-   BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
-   BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
-   BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
BIT_ULL(HW_FEATURE_V4))
 
 #define hw_features_t720 (\
-   BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
-   BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
-   BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
-   BIT_ULL(HW_FEATURE_OPTIMIZED_COVERAGE_MASK) | \
-   BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
-   BIT_ULL(HW_FEATURE_WORKGROUP_ROUND_MULTIPLE_OF_4) | \
-   BIT_ULL(HW_FEATURE_WARPING) | \
BIT_ULL(HW_FEATURE_V4))
 
 
@@ -70,17 +40,6 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
-   BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
-   BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
-   BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
-   BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
-   BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
-   BIT_ULL(HW_FEATURE_MRT) | \
-   BIT_ULL(HW_FEATURE_MSAA_16X) | \
-   BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
-   BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
-   BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
 
 // T860
@@ -88,19 +47,6 @@ enum panfrost_hw_feature {
BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
BIT_ULL(HW_FEATURE_XAFFINITY) | \
-   BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
-   BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
-   BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
-   BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
-   BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
-   BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
-   BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
-   BIT_ULL(HW_FEATURE_MRT) | \
-   BIT_ULL(HW_FEATURE_MSAA_16X) | \
-   BIT_ULL(HW_FEATURE_NEXT_INSTRUC

[PATCH 0/2] drm/panfrost: Clean up our feature lists

2022-01-09 Thread Alyssa Rosenzweig

We've cargo culted a large number of useless feature bits from kbase.
We're about to add support for a number of new Mali GPUs into mainline.
Let's cut down on the copy-paste required and clean up the feature lists
first.

Alyssa Rosenzweig (2):
  drm/panfrost: Remove features meant for userspace
  drm/panfrost: Merge some feature lists

 drivers/gpu/drm/panfrost/panfrost_features.h | 212 +--
 1 file changed, 7 insertions(+), 205 deletions(-)

-- 
2.34.1

[PATCH] drm/panfrost: Update create_bo flags comment

2022-01-09 Thread Alyssa Rosenzweig

Update a comment stating create_bo took no flags, since it now takes a
bit mask of optional flags NOEXEC and HEAP.

Signed-off-by: Alyssa Rosenzweig 
---
 include/uapi/drm/panfrost_drm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 061e700dd06c..9e40277d8185 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -84,14 +84,14 @@ struct drm_panfrost_wait_bo {
__s64 timeout_ns;   /* absolute */
 };
 
+/* Valid flags to pass to drm_panfrost_create_bo */
 #define PANFROST_BO_NOEXEC 1
 #define PANFROST_BO_HEAP   2
 
 /**
  * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
  *
- * There are currently no values for the flags argument, but it may be
- * used in a future extension.
+ * The flags argument is a bit mask of PANFROST_BO_* flags.
  */
 struct drm_panfrost_create_bo {
__u32 size;
-- 
2.34.1

Re: [PATCH 0/2] drm/panfrost: partial support of T628 GPUs

2021-12-23 Thread Alyssa Rosenzweig

> With these patches panfrost is able to drive mali T628 (r1p0) GPU
> on some armv8 SoCs (in particular BE-M1000).
> r0 GPUs are still not supported [yet] (tested with Exynos 5422).

What's needed for r0?

Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs

2021-12-23 Thread Alyssa Rosenzweig

> The kernel driver itself can't guess which jobs need a such a strict
> affinity, so setting proper requirements is the responsibility of
> the userspace (Mesa). However the userspace is not smart enough [yet].
> Therefore this patch applies the above affinity rule to all jobs on
> dual core group GPUs.

What does Mesa need to do for this to work "properly"? What are the
limitations of the approach implemented here? If we need to extend it
down the line with a UABI change, what would that look like?

Thanks,

Alyssa

Re: [PATCH] drm/panfrost: Avoid user size passed to kvmalloc()

2021-12-16 Thread Alyssa Rosenzweig

> This provides an easy method for user
> space to trigger the OOM killer (by temporarily allocating large amounts
> of kernel memory)

panfrost user space has a lot of easy ways to trigger to the OOM killer
unfortunately  if this is something we want to fix there are a lot
more patches coming :(

Re: [PATCH v2] drm/cma-helper: Set VM_DONTEXPAND for mmap

2021-10-13 Thread Alyssa Rosenzweig

> > From: Robin Murphy 
> > 
> > drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
> > will end up calling remap_pfn_range() (which happens to set the relevant
> > vma flag, among others), so in order to make sure expectations around
> > VM_DONTEXPAND are met, let it explicitly set the flag like most other
> > GEM mmap implementations do.
> > 
> > This avoids repeated warnings on a small minority of systems where the
> > display is behind an IOMMU, and has a simple driver which does not
> > override drm_gem_cma_default_funcs. Arm hdlcd is an in-tree affected
> > driver. Out-of-tree, the Apple DCP driver is affected; this fix is
> > required for DCP to be mainlined.
> 
> How/where does this warn? Also there should be a lot more drivers than
> just these two which have an iommu for the display block, so this not
> working is definitely a more wide-spread issue.

To summarize our discussion on IRC:

This fails `WARN_ON(!(vma->vm_flags & VM_DONTEXPAND))` in
drm_gem_mmap_obj. This warning was introduced in Oct 2019.

For a driver to hit this code path, it must use the CMA helpers without
overriding dem_gem_cma_default_funcs, but use CMA backed by a hardware
IOMMU instead of a physical carveout. This means popular drivers don't
hit this warning: normal drivers that use CMA do so with a carveout
instead of an IOMMU, and normal drivers with an IOMMU do not use the
default CMA helpers. hdlcd is one of the few drivers hitting this, but
hdlcd gets very little testing. Seeing as the last significant change to
hdlcd was in May 2019, it's believable that nobody noticed until Robin
hit this WARN and typed out this patch, especially as the driver still
works despite the WARN.

[PATCH v2] drm/cma-helper: Set VM_DONTEXPAND for mmap

2021-10-13 Thread Alyssa Rosenzweig

From: Robin Murphy 

drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
will end up calling remap_pfn_range() (which happens to set the relevant
vma flag, among others), so in order to make sure expectations around
VM_DONTEXPAND are met, let it explicitly set the flag like most other
GEM mmap implementations do.

This avoids repeated warnings on a small minority of systems where the
display is behind an IOMMU, and has a simple driver which does not
override drm_gem_cma_default_funcs. Arm hdlcd is an in-tree affected
driver. Out-of-tree, the Apple DCP driver is affected; this fix is
required for DCP to be mainlined.

Signed-off-by: Robin Murphy 
Reviewed-and-tested-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index d53388199f34..63e48d98263d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -510,6 +510,7 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma)
 */
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
vma->vm_flags &= ~VM_PFNMAP;
+   vma->vm_flags |= VM_DONTEXPAND;
 
cma_obj = to_drm_gem_cma_obj(obj);
 
-- 
2.30.2

Re: [PATCH v2 1/5] [RFC]iommu: Add a IOMMU_DEVONLY protection flag

2021-10-01 Thread Alyssa Rosenzweig

> The IOMMU_DEVONLY flag allows the caller to flag a mappings backed by
> device-private buffers. That means other devices or CPUs are not
> expected to access the physical memory region pointed by the mapping,
> and the MMU driver can safely restrict the shareability domain to the
> device itself.
> 
> Will be used by the ARM MMU driver to flag Mali mappings accessed only
> by the GPU as Inner-shareable.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  include/linux/iommu.h | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d2f3435e7d17..db14781b522f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -31,6 +31,13 @@
>   * if the IOMMU page table format is equivalent.
>   */
>  #define IOMMU_PRIV   (1 << 5)
> +/*
> + * Mapping is only accessed by the device behind the iommu. That means other
> + * devices or CPUs are not expected to access this physical memory region,
> + * and the MMU driver can safely restrict the shareability domain to the
> + * device itself.
> + */
> +#define IOMMU_DEVONLY(1 << 6)
>  
>  struct iommu_ops;
>  struct iommu_group;

This seems totally reasonable to me, but it is well-known that I'm not
on good terms with the iommu subsystem. Let's wait for Robin to NAK :-P

Re: [PATCH v2 4/5] drm/panfrost: Add a PANFROST_BO_GPUONLY flag

2021-10-01 Thread Alyssa Rosenzweig

> > This seems reasonable to me - it matches the kbase
> > BASE_MEM_COHERENT_SYSTEM (only backwards obviously) and it worked
> > reasonably well for the blob.

Oh, is that what that was for? I remember seeing it set on Midgard for
varyings. Good to go full circle now.

> > But I'm wondering if we need to do anything special to deal with the
> > fact we will now have some non-coherent mappings on an otherwise
> > coherent device.
> > 
> > There are certainly some oddities around how these buffers will be
> > mapped into user space if requested, e.g. panfrost_gem_create_object()
> > sets 'map_wc' based on pfdev->coherent which is arguably wrong for
> > GPUONLY. So there are two things we could consider:
> > 
> > a) Actually prevent user space mapping GPUONLY flagged buffers. Which
> > matches the intention of the name.
> 
> I intended to do that, just forgot to add wrappers around
> drm_gem_shmem_{mmap,vmap}() to forbid CPU-mappings on gpuonly buffers.

This feels like the cleaner solution to me.

> > b) Attempt to provide user space with the tools to safely interact with
> > the buffers (this is the kbase approach). This does have the benefit of
> > allowing *mostly* GPU access. An example here is the tiler heap where
> > the CPU could zero out as necessary but mostly the GPU has ownership and
> > the CPU never reads the contents. GPUONLY/DEVONLY might not be the best
> > name in that case.
> 
> Uh, right, I forgot we had to zero the tiler heap on Midgard (most of
> the time done with a WRITE_VALUE job, but there's an exception on some
> old Midgard GPUs IIRC).

"Attempt" is the key word here :|

We indeed only touch the tiler heap from the CPU on v4, and life's too
short to care about new optimizations for v4. Unless the patch is
trivial, my vote is for a) preventing the mappings and only setting
GPUONLY on the tiler_heap starting with v5.

Re: [PATCH] drm/panfrost: Add PANFROST_BO_NO{READ,WRITE} flags

2021-10-01 Thread Alyssa Rosenzweig

> > > > > + /* Executable implies readable */
> > > > > + if ((args->flags & PANFROST_BO_NOREAD) &&
> > > > > + !(args->flags & PANFROST_BO_NOEXEC))
> > > > > + return -EINVAL;
> > > > 
> > > > Generally, executable also implies not-writeable. Should we check that? 
> > > >  
> > > 
> > > We were allowing it until now, so doing that would break the backward
> > > compat, unfortunately.  
> > 
> > Not a problem if you only enforce this starting with the appropriate
> > UABI version, but...
> 
> I still don't see how that solves the 
> situation, since old-userspace doesn't know about the new UABI, and
> there's no version field on the CREATE_BO ioctl() to let the kernel
> know about the UABI used by this userspace program. I mean, we could
> add one, or add a new PANFROST_BO_EXTENDED_FLAGS flag to enforce this
> 'noexec implies nowrite' behavior, but is it really simpler than
> explicitly passing the NOWRITE flag when NOEXEC is passed?

For some reason I thought the ABI version was negotiated (it is in
kbase). Don't worry about it.

That commit is

Reviewed-by: Alyssa Rosenzweig

Re: [PATCH] drm/panfrost: Add PANFROST_BO_NO{READ,WRITE} flags

2021-09-30 Thread Alyssa Rosenzweig

> > > + /* Executable implies readable */
> > > + if ((args->flags & PANFROST_BO_NOREAD) &&
> > > + !(args->flags & PANFROST_BO_NOEXEC))
> > > + return -EINVAL;  
> > 
> > Generally, executable also implies not-writeable. Should we check that?
> 
> We were allowing it until now, so doing that would break the backward
> compat, unfortunately.

Not a problem if you only enforce this starting with the appropriate
UABI version, but...

> Steve also mentioned that the DDK might use shaders modifying other
> shaders here [1]

What? I believe it, but what?

For the case of pilot shaders, that shouldn't require self-modifying
code. As I understand, the DDK binds the push uniform (FAU / RMU) buffer
as global shader memory (SSBO) and uses regular STORE instructions on
it. That requires writability on that BO but that should be fine.

Re: [PATCH] drm/panfrost: Add PANFROST_BO_NO{READ,WRITE} flags

2021-09-30 Thread Alyssa Rosenzweig

> + /* Executable implies readable */
> + if ((args->flags & PANFROST_BO_NOREAD) &&
> + !(args->flags & PANFROST_BO_NOEXEC))
> + return -EINVAL;

Generally, executable also implies not-writeable. Should we check that?

Re: [PATCH 5/9] drm/panfrost: simplify getting .driver_data

2021-09-20 Thread Alyssa Rosenzweig

Reviewed-by: Alyssa Rosenzweig 

> index bd9b7be63b0f..fd4309209088 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> @@ -400,8 +400,7 @@ void panfrost_device_reset(struct panfrost_device *pfdev)
>  #ifdef CONFIG_PM
>  int panfrost_device_resume(struct device *dev)
>  {
> - struct platform_device *pdev = to_platform_device(dev);
> - struct panfrost_device *pfdev = platform_get_drvdata(pdev);
> + struct panfrost_device *pfdev = dev_get_drvdata(dev);
>  
>   panfrost_device_reset(pfdev);
>   panfrost_devfreq_resume(pfdev);
> @@ -411,8 +410,7 @@ int panfrost_device_resume(struct device *dev)
>  
>  int panfrost_device_suspend(struct device *dev)
>  {
> - struct platform_device *pdev = to_platform_device(dev);
> - struct panfrost_device *pfdev = platform_get_drvdata(pdev);
> + struct panfrost_device *pfdev = dev_get_drvdata(dev);
>  
>   if (!panfrost_job_is_idle(pfdev))
>   return -EBUSY;
> -- 
> 2.30.2
>

Re: [PATCH v2] drm/panfrost: Calculate lock region size correctly

2021-09-15 Thread Alyssa Rosenzweig

Took me a careful read, but this is

Reviewed-by: Alyssa Rosenzweig 

Thanks for hunting this down!

Re: [PATCH 1/5] drm: Add drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

> Missing documentation :-)

Ack.

> > +static inline int drm_fixed_16_16(s32 mult, s32 div)
> 
> You should return a s32.

Ack.

> The function name isn't very explicit, and departs from the naming
> scheme of other functions in the same file. As fixed-point numbers are
> stored in a s64 for the drm_fixp_* helpers, we shouldn't rese the
> drm_fixp_ prefix, maybe drm_fixp_s16_16_ would be a good prefix. The
> function should probably be named drm_fixp_s16_16 from_fraction() then,
> but then the same logic should possibly be replicated to ensure optimal
> precision. I wonder if it wouldn't be best to simply use
> drm_fixp_from_fraction() and shift the result right by 16 bits.

Sure, I'm not attached to the naming ... will wait to hear what colours
everyone else wants the bikehed painted.

As for the implementation, I just went with what was used across
multiple drivers already (no chance of regressions that way) but could
reuse other helpers if it's better..? If the behaviour changes this goes
from a trivial cleanup to a much more invasive changeset. I don't own
half of the hardware here.

[PATCH 5/5] drm/zte: Use common drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

Replace our open-coded FRAC_16_16 with the common drm_fixed_16_16
helper to reduce code duplication between drivers.

Signed-off-by: Alyssa Rosenzweig 
---
 drivers/gpu/drm/zte/zx_plane.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/zte/zx_plane.c b/drivers/gpu/drm/zte/zx_plane.c
index 93bcca428e35..80f61d79be83 100644
--- a/drivers/gpu/drm/zte/zx_plane.c
+++ b/drivers/gpu/drm/zte/zx_plane.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "zx_common_regs.h"
 #include "zx_drm_drv.h"
@@ -43,8 +44,6 @@ static const uint32_t vl_formats[] = {
 */
 };
 
-#define FRAC_16_16(mult, div)(((mult) << 16) / (div))
-
 static int zx_vl_plane_atomic_check(struct drm_plane *plane,
struct drm_atomic_state *state)
 {
@@ -53,8 +52,8 @@ static int zx_vl_plane_atomic_check(struct drm_plane *plane,
struct drm_framebuffer *fb = plane_state->fb;
struct drm_crtc *crtc = plane_state->crtc;
struct drm_crtc_state *crtc_state;
-   int min_scale = FRAC_16_16(1, 8);
-   int max_scale = FRAC_16_16(8, 1);
+   int min_scale = drm_fixed_16_16(1, 8);
+   int max_scale = drm_fixed_16_16(8, 1);
 
if (!crtc || WARN_ON(!fb))
return 0;
-- 
2.30.2

[PATCH 4/5] drm/rockchip: Use common drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

Replace our open-coded FRAC_16_16 with the common drm_fixed_16_16
helper to reduce code duplication between drivers.

Signed-off-by: Alyssa Rosenzweig 
Cc: linux-rockc...@lists.infradead.org
---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 9 +
 drivers/gpu/drm/rockchip/rockchip_drm_vop.h | 1 -
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index ba9e14da41b4..9428fcba400f 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_DRM_ANALOGIX_DP
 #include 
@@ -789,9 +790,9 @@ static int vop_plane_atomic_check(struct drm_plane *plane,
struct vop_win *vop_win = to_vop_win(plane);
const struct vop_win_data *win = vop_win->data;
int ret;
-   int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
+   int min_scale = win->phy->scl ? drm_fixed_16_16(1, 8) :
DRM_PLANE_HELPER_NO_SCALING;
-   int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
+   int max_scale = win->phy->scl ? drm_fixed_16_16(8, 1) :
DRM_PLANE_HELPER_NO_SCALING;
 
if (!crtc || WARN_ON(!fb))
@@ -1037,9 +1038,9 @@ static int vop_plane_atomic_async_check(struct drm_plane 
*plane,

 plane);
struct vop_win *vop_win = to_vop_win(plane);
const struct vop_win_data *win = vop_win->data;
-   int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
+   int min_scale = win->phy->scl ? drm_fixed_16_16(1, 8) :
DRM_PLANE_HELPER_NO_SCALING;
-   int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
+   int max_scale = win->phy->scl ? drm_fixed_16_16(8, 1) :
DRM_PLANE_HELPER_NO_SCALING;
struct drm_crtc_state *crtc_state;
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
index 857d97cdc67c..cada12e653cc 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
@@ -335,7 +335,6 @@ enum vop_pol {
DEN_NEGATIVE   = 2
 };
 
-#define FRAC_16_16(mult, div)(((mult) << 16) / (div))
 #define SCL_FT_DEFAULT_FIXPOINT_SHIFT  12
 #define SCL_MAX_VSKIPLINES 4
 #define MIN_SCL_FT_AFTER_VSKIP 1
-- 
2.30.2

[PATCH 3/5] drm/msm: Use common drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

Replace our open-coded FRAC_16_16 with the common drm_fixed_16_16
helper to reduce code duplication between drivers.

Signed-off-by: Alyssa Rosenzweig 
Cc: linux-arm-...@vger.kernel.org
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c  | 2 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 8 
 drivers/gpu/drm/msm/msm_drv.h  | 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
index c989621209aa..fc9a9f544110 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
@@ -964,7 +964,7 @@ static int dpu_plane_atomic_check(struct drm_plane *plane,
crtc_state = drm_atomic_get_new_crtc_state(state,
   
new_plane_state->crtc);
 
-   min_scale = FRAC_16_16(1, pdpu->pipe_sblk->maxupscale);
+   min_scale = drm_fixed_16_16(1, pdpu->pipe_sblk->maxupscale);
ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state,
  min_scale,
  pdpu->pipe_sblk->maxdwnscale 
<< 16,
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c
index c6b69afcbac8..079b0662ee3c 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c
@@ -199,8 +199,8 @@ static int mdp5_plane_atomic_check_with_state(struct 
drm_crtc_state *crtc_state,
return -ERANGE;
}
 
-   min_scale = FRAC_16_16(1, 8);
-   max_scale = FRAC_16_16(8, 1);
+   min_scale = drm_fixed_16_16(1, 8);
+   max_scale = drm_fixed_16_16(8, 1);
 
ret = drm_atomic_helper_check_plane_state(state, crtc_state,
  min_scale, max_scale,
@@ -381,8 +381,8 @@ static int mdp5_plane_atomic_async_check(struct drm_plane 
*plane,
plane->state->fb != new_plane_state->fb)
return -EINVAL;
 
-   min_scale = FRAC_16_16(1, 8);
-   max_scale = FRAC_16_16(8, 1);
+   min_scale = drm_fixed_16_16(1, 8);
+   max_scale = drm_fixed_16_16(8, 1);
 
ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state,
  min_scale, max_scale,
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 8b005d1ac899..b5aa94024a42 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct msm_kms;
 struct msm_gpu;
@@ -51,8 +52,6 @@ struct msm_disp_state;
 #define MAX_BRIDGES8
 #define MAX_CONNECTORS 8
 
-#define FRAC_16_16(mult, div)(((mult) << 16) / (div))
-
 struct msm_file_private {
rwlock_t queuelock;
struct list_head submitqueues;
-- 
2.30.2

[PATCH 2/5] drm/meson: Use common drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

Replace our open-coded FRAC_16_16 with the common drm_fixed_16_16
helper to reduce code duplication between drivers.

Signed-off-by: Alyssa Rosenzweig 
Cc: linux-amlo...@lists.infradead.org
---
 drivers/gpu/drm/meson/meson_overlay.c | 7 +++
 drivers/gpu/drm/meson/meson_plane.c   | 5 ++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/meson/meson_overlay.c 
b/drivers/gpu/drm/meson/meson_overlay.c
index dfef8afcc245..d8fc6bbb332f 100644
--- a/drivers/gpu/drm/meson/meson_overlay.c
+++ b/drivers/gpu/drm/meson/meson_overlay.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "meson_overlay.h"
 #include "meson_registers.h"
@@ -162,8 +163,6 @@ struct meson_overlay {
 };
 #define to_meson_overlay(x) container_of(x, struct meson_overlay, base)
 
-#define FRAC_16_16(mult, div)(((mult) << 16) / (div))
-
 static int meson_overlay_atomic_check(struct drm_plane *plane,
  struct drm_atomic_state *state)
 {
@@ -181,8 +180,8 @@ static int meson_overlay_atomic_check(struct drm_plane 
*plane,
 
return drm_atomic_helper_check_plane_state(new_plane_state,
   crtc_state,
-  FRAC_16_16(1, 5),
-  FRAC_16_16(5, 1),
+  drm_fixed_16_16(1, 5),
+  drm_fixed_16_16(5, 1),
   true, true);
 }
 
diff --git a/drivers/gpu/drm/meson/meson_plane.c 
b/drivers/gpu/drm/meson/meson_plane.c
index 8640a8a8a469..4fae9ebbf178 100644
--- a/drivers/gpu/drm/meson/meson_plane.c
+++ b/drivers/gpu/drm/meson/meson_plane.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "meson_plane.h"
 #include "meson_registers.h"
@@ -68,8 +69,6 @@ struct meson_plane {
 };
 #define to_meson_plane(x) container_of(x, struct meson_plane, base)
 
-#define FRAC_16_16(mult, div)(((mult) << 16) / (div))
-
 static int meson_plane_atomic_check(struct drm_plane *plane,
struct drm_atomic_state *state)
 {
@@ -92,7 +91,7 @@ static int meson_plane_atomic_check(struct drm_plane *plane,
 */
return drm_atomic_helper_check_plane_state(new_plane_state,
   crtc_state,
-  FRAC_16_16(1, 5),
+  drm_fixed_16_16(1, 5),
   DRM_PLANE_HELPER_NO_SCALING,
   false, true);
 }
-- 
2.30.2

[PATCH 1/5] drm: Add drm_fixed_16_16 helper

2021-09-01 Thread Alyssa Rosenzweig

This constructs a fixed 16.16 rational, useful to specify the minimum
and maximum scaling in drm_atomic_helper_check_plane_state. It is
open-coded as a macro in multiple drivers, so let's share the helper.

Signed-off-by: Alyssa Rosenzweig 
---
 include/drm/drm_fixed.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/drm/drm_fixed.h b/include/drm/drm_fixed.h
index 553210c02ee0..df1f369b4918 100644
--- a/include/drm/drm_fixed.h
+++ b/include/drm/drm_fixed.h
@@ -208,4 +208,9 @@ static inline s64 drm_fixp_exp(s64 x)
return sum;
 }
 
+static inline int drm_fixed_16_16(s32 mult, s32 div)
+{
+   return (mult << 16) / div;
+}
+
 #endif
-- 
2.30.2

[PATCH] drm/plane: Fix comment typo

2021-08-29 Thread Alyssa Rosenzweig

Minor typofix noticed when reading the KMS documentation.

Signed-off-by: Alyssa Rosenzweig 
---
 include/drm/drm_plane.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index fed97e35626f..0c1102dc4d88 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -43,7 +43,7 @@ enum drm_scaling_filter {
 /**
  * struct drm_plane_state - mutable plane state
  *
- * Please not that the destination coordinates @crtc_x, @crtc_y, @crtc_h and
+ * Please note that the destination coordinates @crtc_x, @crtc_y, @crtc_h and
  * @crtc_w and the source coordinates @src_x, @src_y, @src_h and @src_w are the
  * raw coordinates provided by userspace. Drivers should use
  * drm_atomic_helper_check_plane_state() and only use the derived rectangles in
-- 
2.30.2

Re: [PATCH] drm/cma-helper: Set VM_DONTEXPAND for mmap

2021-08-25 Thread Alyssa Rosenzweig

> drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
> will end up calling remap_pfn_range() (which happens to set the relevant
> vma flag, among others), so in order to make sure expectations around
> VM_DONTEXPAND are met, let it explicitly set the flag like most other
> GEM mmap implementations do.
> 
> This avoids repeated warnings on a small minority of systems where the
> display is behind an IOMMU, and has a simple driver which does not
> override drm_gem_cma_default_funcs.

Apple system-on-chips have their display behind an IOMMU. Actually, a
separate IOMMU for each display, and a separate IOMMU for each display
controller -- so there are 4 IOMMUs total for display on the M1.

I've tested this patch against my work-in-progress display driver for
the M1. It indeed fixes the annoying warnings every frame (wayland) and
on mode setting (x11). So this is

Tested-by: Alyssa Rosenzweig 

I've cherry-picked the patch into my M1 staging/downstream tree, so I
guess that's an Acked-by. I don't know anything about the vm_* stuff in
the kernel yet, though, since can't give a reviewed-by. Will leave that
one to the pro's.

I know you were trying to fix an HDLCD issue, but I needed this patch
too, so thank you!

[PATCH v2] drm/panfrost: Use upper/lower_32_bits helpers

2021-08-25 Thread Alyssa Rosenzweig

Use upper_32_bits/lower_32_bits helpers instead of open-coding them.
This is easier to scan quickly compared to bitwise manipulation, and it
is pleasingly symmetric. I noticed this when debugging lock_region,
which had a particularly "creative" way of writing upper_32_bits.

v2: Use helpers for one more call site and add review tag (Steven).

Signed-off-by: Alyssa Rosenzweig 
Reviewed-by: Rob Herring  (v1)
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_job.c |  8 
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++--
 drivers/gpu/drm/panfrost/panfrost_perfcnt.c |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71a72fb50e6b..763b7abfc88e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -137,8 +137,8 @@ static void panfrost_job_write_affinity(struct 
panfrost_device *pfdev,
 */
affinity = pfdev->features.shader_present;
 
-   job_write(pfdev, JS_AFFINITY_NEXT_LO(js), affinity & 0x);
-   job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32);
+   job_write(pfdev, JS_AFFINITY_NEXT_LO(js), lower_32_bits(affinity));
+   job_write(pfdev, JS_AFFINITY_NEXT_HI(js), upper_32_bits(affinity));
 }
 
 static u32
@@ -203,8 +203,8 @@ static void panfrost_job_hw_submit(struct panfrost_job 
*job, int js)
 
cfg = panfrost_mmu_as_get(pfdev, job->file_priv->mmu);
 
-   job_write(pfdev, JS_HEAD_NEXT_LO(js), jc_head & 0x);
-   job_write(pfdev, JS_HEAD_NEXT_HI(js), jc_head >> 32);
+   job_write(pfdev, JS_HEAD_NEXT_LO(js), lower_32_bits(jc_head));
+   job_write(pfdev, JS_HEAD_NEXT_HI(js), upper_32_bits(jc_head));
 
panfrost_job_write_affinity(pfdev, job->requirements, js);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 0da5b3100ab1..c3fbe0ad9090 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -80,8 +80,8 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
region |= region_width;
 
/* Lock the region that needs to be updated */
-   mmu_write(pfdev, AS_LOCKADDR_LO(as_nr), region & 0xUL);
-   mmu_write(pfdev, AS_LOCKADDR_HI(as_nr), (region >> 32) & 0xUL);
+   mmu_write(pfdev, AS_LOCKADDR_LO(as_nr), lower_32_bits(region));
+   mmu_write(pfdev, AS_LOCKADDR_HI(as_nr), upper_32_bits(region));
write_cmd(pfdev, as_nr, AS_COMMAND_LOCK);
 }
 
@@ -123,14 +123,14 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
 
mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
 
-   mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xUL);
-   mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32);
+   mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
+   mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
 
/* Need to revisit mem attrs.
 * NC is the default, Mali driver is inner WT.
 */
-   mmu_write(pfdev, AS_MEMATTR_LO(as_nr), memattr & 0xUL);
-   mmu_write(pfdev, AS_MEMATTR_HI(as_nr), memattr >> 32);
+   mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
+   mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
 
write_cmd(pfdev, as_nr, AS_COMMAND_UPDATE);
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c 
b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
index 5ab03d605f57..e116a4d9b8e5 100644
--- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
+++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c
@@ -51,8 +51,8 @@ static int panfrost_perfcnt_dump_locked(struct 
panfrost_device *pfdev)
 
reinit_completion(&pfdev->perfcnt->dump_comp);
gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT;
-   gpu_write(pfdev, GPU_PERFCNT_BASE_LO, gpuva);
-   gpu_write(pfdev, GPU_PERFCNT_BASE_HI, gpuva >> 32);
+   gpu_write(pfdev, GPU_PERFCNT_BASE_LO, lower_32_bits(gpuva));
+   gpu_write(pfdev, GPU_PERFCNT_BASE_HI, upper_32_bits(gpuva));
gpu_write(pfdev, GPU_INT_CLEAR,
  GPU_IRQ_CLEAN_CACHES_COMPLETED |
  GPU_IRQ_PERFCNT_SAMPLE_COMPLETED);
-- 
2.30.2

Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

2021-08-25 Thread Alyssa Rosenzweig

> > Horrifying, and not what I wanted to read my last day before 2 weeks of
> > leave. Let's drop this patch, hopefully by the time I'm back, your
> > friends in GPU can confirm that's a spec bug and not an actual
> > hardware/driver one...
> > 
> > Can you apply the other 3 patches in the mean time? Thanks :-)
> > 
> 
> Yeah, sure. I'll push the first 3 to drm-misc-next-fixes (should land in
> v5.15).
> 
> It's interesting that if my (new) reading of the spec is correct then
> kbase has been horribly broken in this respect forever. So clearly it
> can't be something that crops up very often. It would have been good if
> the spec could have included wording such as "naturally aligned" if
> that's what was intended.

Indeed. Fingers crossed this is a mix-up. Although the text you quoted
seems pretty clear unfortunately :|

> Enjoy your holiday!

Thanks!

Re: [PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

2021-08-25 Thread Alyssa Rosenzweig

> > In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
> > the bug. Therefore this doesn't need to be backported. Still, that's a
> > happy accident and not a precondition of lock_region, so we let's do the
> > right thing to future proof.
> 
> Actually it's worse than that due to the hardware behaviour, the spec
> states (for LOCKADDR_BASE):
> 
> > Only the upper bits of the address are used. The address is aligned to a
> > multiple of the region size, so a variable number of low-order bits are
> > ignored, depending on the selected region size. It is recommended that 
> > software
> > ensures that these low bits in the address are cleared, to avoid confusion.
> 
> It appears that indeed this has caused confusion ;)
> 
> So for a simple request like locking from 0xCAFE - 0xCB01 (size
> = 0x3) the region width gets rounded up (to 0x4) which causes
> the start address to be effectively rounded down (by the hardware) to
> 0xCAFC and we fail to lock 0xCB00-0xCB01.
> 
> Interestingly (unless my reading of this is wrong) that means to lock
> 0x-0x10001 (i.e. crossing the 4GB boundary) requires locking
> *at least* 0x-0x2 (i.e. locking the first 8GB).
> 
> This appears to be broken in kbase (which actually does zero out the low
> bits of the address) - I've raised a bug internally so hopefully someone
> will tell me if I've read the spec completely wrong here.

Horrifying, and not what I wanted to read my last day before 2 weeks of
leave. Let's drop this patch, hopefully by the time I'm back, your
friends in GPU can confirm that's a spec bug and not an actual
hardware/driver one...

Can you apply the other 3 patches in the mean time? Thanks :-)

[PATCH v2 4/4] drm/panfrost: Handle non-aligned lock addresses

2021-08-24 Thread Alyssa Rosenzweig

When locking memory, the base address is rounded down to the nearest
page. The current code does not adjust the size in this case,
truncating the lock region:

Input:  [size]
Round: [size]

To fix the truncation, extend the lock region by the amount rounded off.

Input:  [size]
Round: [---size--]

This bug is difficult to hit under current assumptions: since the size
of the lock region is stored as a ceil(log2), the truncation must cause
us to cross a power-of-two boundary. This is possible, for example if
the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The
existing code rounds down the iova to 0xCAFE and rounds up the lock
region to 65536 bytes, locking until 0xCAFF. This fails to lock the
last 15 bytes.

In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding
the bug. Therefore this doesn't need to be backported. Still, that's a
happy accident and not a precondition of lock_region, so we let's do the
right thing to future proof.

Signed-off-by: Alyssa Rosenzweig 
Reported-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index dfe5f1d29763..14be32497ec3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
u8 region_width;
u64 region = iova & PAGE_MASK;
 
+   /* After rounding the address down, extend the size to lock the end. */
+   size += (region - iova);
+
/* The size is encoded as ceil(log2) minus(1), which may be calculated
 * with fls. The size must be clamped to hardware bounds.
 */
-- 
2.30.2

[PATCH v2 3/4] drm/panfrost: Clamp lock region to Bifrost minimum

2021-08-24 Thread Alyssa Rosenzweig

When locking a region, we currently clamp to a PAGE_SIZE as the minimum
lock region. While this is valid for Midgard, it is invalid for Bifrost,
where the minimum locking size is 8x larger than the 4k page size. Add a
hardware definition for the minimum lock region size (corresponding to
KBASE_LOCK_REGION_MIN_SIZE_LOG2 in kbase) and respect it.

Signed-off-by: Alyssa Rosenzweig 
Tested-by: Chris Morgan 
Reviewed-by: Steven Price 
Cc: 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c  | 2 +-
 drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 3a795273e505..dfe5f1d29763 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -66,7 +66,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
/* The size is encoded as ceil(log2) minus(1), which may be calculated
 * with fls. The size must be clamped to hardware bounds.
 */
-   size = max_t(u64, size, PAGE_SIZE);
+   size = max_t(u64, size, AS_LOCK_REGION_MIN_SIZE);
region_width = fls64(size - 1) - 1;
region |= region_width;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 1940ff86e49a..6c5a11ef1ee8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -316,6 +316,8 @@
 #define AS_FAULTSTATUS_ACCESS_TYPE_READ(0x2 << 8)
 #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE   (0x3 << 8)
 
+#define AS_LOCK_REGION_MIN_SIZE (1ULL << 15)
+
 #define gpu_write(dev, reg, data) writel(data, dev->iomem + reg)
 #define gpu_read(dev, reg) readl(dev->iomem + reg)
 
-- 
2.30.2

[PATCH v2 2/4] drm/panfrost: Use u64 for size in lock_region

2021-08-24 Thread Alyssa Rosenzweig

Mali virtual addresses are 48-bit. Use a u64 instead of size_t to ensure
we can express the "lock everything" condition as ~0ULL without
overflow. This code was silently broken on any platform where a size_t
is less than 48-bits; in particular, it was broken on 32-bit armv7
platforms which remain in use with panfrost. (Mainly RK3288)

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Rob Herring 
Tested-by: Chris Morgan 
Reviewed-by: Steven Price 
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index f6e02d0392f4..3a795273e505 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -58,7 +58,7 @@ static int write_cmd(struct panfrost_device *pfdev, u32 
as_nr, u32 cmd)
 }
 
 static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
-   u64 iova, size_t size)
+   u64 iova, u64 size)
 {
u8 region_width;
u64 region = iova & PAGE_MASK;
@@ -78,7 +78,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
 
 
 static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,
- u64 iova, size_t size, u32 op)
+ u64 iova, u64 size, u32 op)
 {
if (as_nr < 0)
return 0;
@@ -95,7 +95,7 @@ static int mmu_hw_do_operation_locked(struct panfrost_device 
*pfdev, int as_nr,
 
 static int mmu_hw_do_operation(struct panfrost_device *pfdev,
   struct panfrost_mmu *mmu,
-  u64 iova, size_t size, u32 op)
+  u64 iova, u64 size, u32 op)
 {
int ret;
 
@@ -112,7 +112,7 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
u64 transtab = cfg->arm_mali_lpae_cfg.transtab;
u64 memattr = cfg->arm_mali_lpae_cfg.memattr;
 
-   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, 
AS_COMMAND_FLUSH_MEM);
 
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xUL);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32);
@@ -128,7 +128,7 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
 
 static void panfrost_mmu_disable(struct panfrost_device *pfdev, u32 as_nr)
 {
-   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, 
AS_COMMAND_FLUSH_MEM);
 
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), 0);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), 0);
@@ -242,7 +242,7 @@ static size_t get_pgsize(u64 addr, size_t size)
 
 static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
 struct panfrost_mmu *mmu,
-u64 iova, size_t size)
+u64 iova, u64 size)
 {
if (mmu->as < 0)
return;
-- 
2.30.2

[PATCH v2 1/4] drm/panfrost: Simplify lock_region calculation

2021-08-24 Thread Alyssa Rosenzweig

In lock_region, simplify the calculation of the region_width parameter.
This field is the size, but encoded as ceil(log2(size)) - 1.
ceil(log2(size)) may be computed directly as fls(size - 1). However, we
want to use the 64-bit versions as the amount to lock can exceed
32-bits.

This avoids undefined (and completely wrong) behaviour when locking all
memory (size ~0). In this case, the old code would "round up" ~0 to the
nearest page, overflowing to 0. Since fls(0) == 0, this would calculate
a region width of 10 + 0 = 10. But then the code would shift by
(region_width - 11) = -1. As shifting by a negative number is undefined,
UBSAN flags the bug. Of course, even if it were defined the behaviour is
wrong, instead of locking all memory almost none would get locked.

The new form of the calculation corrects this special case and avoids
the undefined behaviour.

Signed-off-by: Alyssa Rosenzweig 
Reported-and-tested-by: Chris Morgan 
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 0da5b3100ab1..f6e02d0392f4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
 {
u8 region_width;
u64 region = iova & PAGE_MASK;
-   /*
-* fls returns:
-* 1 .. 32
-*
-* 10 + fls(num_pages)
-* results in the range (11 .. 42)
-*/
-
-   size = round_up(size, PAGE_SIZE);
 
-   region_width = 10 + fls(size >> PAGE_SHIFT);
-   if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
-   /* not pow2, so must go up to the next pow2 */
-   region_width += 1;
-   }
+   /* The size is encoded as ceil(log2) minus(1), which may be calculated
+* with fls. The size must be clamped to hardware bounds.
+*/
+   size = max_t(u64, size, PAGE_SIZE);
+   region_width = fls64(size - 1) - 1;
region |= region_width;
 
/* Lock the region that needs to be updated */
-- 
2.30.2

1 2 3 >

1 - 100 of 250 matches

Mail list logo