Re: [PATCH 1/1] drm/i915/guc: Relax CTB response timeout

2021-06-10 Thread Michal Wajdeczko



On 11.06.2021 02:05, Matthew Brost wrote:
> In upcoming patch we will allow more CTB requests to be sent in
> parallel to the GuC for processing, so we shouldn't assume any more
> that GuC will always reply without 10ms.

s/without/within

> 
> Use bigger value hardcoded value of 1s instead.
> 
> v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
> v3:
>  (Daniel Vetter)
>   - Use hardcoded value of 1s rather than config option

if this is v3 then it's likely still my patch, so I can't give r-b

> 
> Signed-off-by: Matthew Brost 
> Cc: Michal Wajdeczko 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 8f7b148fef58..bc626ca0a9eb 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -475,12 +475,14 @@ static int wait_for_ct_request_update(struct ct_request 
> *req, u32 *status)
>   /*
>* Fast commands should complete in less than 10us, so sample quickly
>* up to that length of time, then switch to a slower sleep-wait loop.
> -  * No GuC command should ever take longer than 10ms.
> +  * No GuC command should ever take longer than 10ms but many GuC
> +  * commands can be inflight at time, so use a 1s timeout on the slower
> +  * sleep-wait loop.

this is x100 increase of timeout that not only looks nice, but it should
cover for ~100 CTB messages (of len 10 dwords) in our current 4K send CT
buffer, so LGTM

Michal

ps. unless in the future we decide to increase that CT size to something
much bigger, so maybe we should connect this timeout with number of
possible concurrent messages in flight? not a blocker

>*/
>  #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
>   err = wait_for_us(done, 10);
>   if (err)
> - err = wait_for(done, 10);
> + err = wait_for(done, 1000);
>  #undef done
>  
>   if (unlikely(err))
> 


Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence

2021-06-10 Thread Christian König

Am 10.06.21 um 22:42 schrieb Daniel Vetter:

On Thu, Jun 10, 2021 at 10:10 PM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter  wrote:

On Thu, Jun 10, 2021 at 11:39 AM Christian König
 wrote:

Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:

On 09/06/2021 22:29, Jason Ekstrand wrote:

We've tried to keep it somewhat contained by doing most of the hard work
to prevent access of recycled objects via dma_fence_get_rcu_safe().
However, a quick grep of kernel sources says that, of the 30 instances
of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
It's likely there bear traps in DRM and related subsystems just waiting
for someone to accidentally step in them.

...because dma_fence_get_rcu_safe apears to be about whether the
*pointer* to the fence itself is rcu protected, not about the fence
object itself.

Yes, exactly that.

The fact that both of you think this either means that I've completely
missed what's going on with RCUs here (possible but, in this case, I
think unlikely) or RCUs on dma fences should scare us all.

Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
such,  I'd like to ask a slightly different question:  What are the
rules about what is allowed to be done under the RCU read lock and
what guarantees does a driver need to provide?

I think so far that we've all agreed on the following:

  1. Freeing an unsignaled fence is ok as long as it doesn't have any
pending callbacks.  (Callbacks should hold a reference anyway).

  2. The pointer race solved by dma_fence_get_rcu_safe is real and
requires the loop to sort out.

But let's say I have a dma_fence pointer that I got from, say, calling
dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
with it under the RCU lock?  What assumptions can I make?  Is this
code, for instance, ok?

rcu_read_lock();
fence = dma_resv_excl_fence(obj);
idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
rcu_read_unlock();

This code very much looks correct under the following assumptions:

  1. A valid fence pointer stays alive under the RCU read lock
  2. SIGNALED_BIT is set-once (it's never unset after being set).

However, if it were, we wouldn't have dma_resv_test_singnaled(), now
would we? :-)

The moment you introduce ANY dma_fence recycling that recycles a
dma_fence within a single RCU grace period, all your assumptions break
down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
also have a little i915_request recycler to try and help with memory
pressure scenarios in certain critical sections that also doesn't
respect RCU grace periods.  And, as mentioned multiple times, our
recycling leaks into every other driver because, thanks to i915's
choice, the above 4-line code snippet isn't valid ANYWHERE in the
kernel.

So the question I'm raising isn't so much about the rules today.
Today, we live in the wild wild west where everything is YOLO.  But
where do we want to go?  Do we like this wild west world?  So we want
more consistency under the RCU read lock?  If so, what do we want the
rules to be?

One option would be to accept the wild-west world we live in and say
"The RCU read lock gains you nothing.  If you want to touch the guts
of a dma_fence, take a reference".  But, at that point, we're eating
two atomics for every time someone wants to look at a dma_fence.  Do
we want that?

Alternatively, and this what I think Daniel and I were trying to
propose here, is that we place some constraints on dma_fence
recycling.  Specifically that, under the RCU read lock, the fence
doesn't suddenly become a new fence.  All of the immutability and
once-mutability guarantees of various bits of dma_fence hold as long
as you have the RCU read lock.

Yeah this is suboptimal. Too many potential bugs, not enough benefits.

This entire __rcu business started so that there would be a lockless
way to get at fences, or at least the exclusive one. That did not
really pan out. I think we have a few options:

- drop the idea of rcu/lockless dma-fence access outright. A quick
sequence of grabbing the lock, acquiring the dma_fence and then
dropping your lock again is probably plenty good. There's a lot of
call_rcu and other stuff we could probably delete. I have no idea what
the perf impact across all the drivers would be.


The question is maybe not the perf impact, but rather if that is 
possible over all.


IIRC we now have some cases in TTM where RCU is mandatory and we simply 
don't have any other choice than using it.



- try to make all drivers follow some stricter rules. The trouble is
that at least with radeon dma_fence callbacks aren't even very
reliable (that's why it has its own dma_fence_wait implementation), so
things are wobbly anyway.

- live with the current situation, but radically delete all unsafe
interfaces. I.e. nothing is allowed to directly deref an rcu fence
pointer, everything goes through dma_fence_g

Re: [PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Tomi Valkeinen

On 11/06/2021 08:54, Maxime Ripard wrote:

Hi,

On Thu, Jun 10, 2021 at 11:00:05PM +0200, Daniel Vetter wrote:

On Thu, Jun 10, 2021 at 7:47 PM Maxime Ripard  wrote:


New KMS properties come with a bunch of requirements to avoid each
driver from running their own, inconsistent, set of properties,
eventually leading to issues like property conflicts, inconsistencies
between drivers and semantics, etc.

Let's document what we expect.

Cc: Alexandre Belloni 
Cc: Alexandre Torgue 
Cc: Alex Deucher 
Cc: Alison Wang 
Cc: Alyssa Rosenzweig 
Cc: Andrew Jeffery 
Cc: Andrzej Hajda 
Cc: Anitha Chrisanthus 
Cc: Benjamin Gaignard 
Cc: Ben Skeggs 
Cc: Boris Brezillon 
Cc: Brian Starkey 
Cc: Chen Feng 
Cc: Chen-Yu Tsai 
Cc: Christian Gmeiner 
Cc: "Christian König" 
Cc: Chun-Kuang Hu 
Cc: Edmund Dea 
Cc: Eric Anholt 
Cc: Fabio Estevam 
Cc: Gerd Hoffmann 
Cc: Haneen Mohammed 
Cc: Hans de Goede 
Cc: "Heiko Stübner" 
Cc: Huang Rui 
Cc: Hyun Kwon 
Cc: Inki Dae 
Cc: Jani Nikula 
Cc: Jernej Skrabec 
Cc: Jerome Brunet 
Cc: Joel Stanley 
Cc: John Stultz 
Cc: Jonas Karlman 
Cc: Jonathan Hunter 
Cc: Joonas Lahtinen 
Cc: Joonyoung Shim 
Cc: Jyri Sarha 
Cc: Kevin Hilman 
Cc: Kieran Bingham 
Cc: Krzysztof Kozlowski 
Cc: Kyungmin Park 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Liviu Dudau 
Cc: Lucas Stach 
Cc: Ludovic Desroches 
Cc: Marek Vasut 
Cc: Martin Blumenstingl 
Cc: Matthias Brugger 
Cc: Maxime Coquelin 
Cc: Maxime Ripard 
Cc: Melissa Wen 
Cc: Neil Armstrong 
Cc: Nicolas Ferre 
Cc: "Noralf Trønnes" 
Cc: NXP Linux Team 
Cc: Oleksandr Andrushchenko 
Cc: Patrik Jakobsson 
Cc: Paul Cercueil 
Cc: Pengutronix Kernel Team 
Cc: Philippe Cornu 
Cc: Philipp Zabel 
Cc: Qiang Yu 
Cc: Rob Clark 
Cc: Robert Foss 
Cc: Rob Herring 
Cc: Rodrigo Siqueira 
Cc: Rodrigo Vivi 
Cc: Roland Scheidegger 
Cc: Russell King 
Cc: Sam Ravnborg 
Cc: Sandy Huang 
Cc: Sascha Hauer 
Cc: Sean Paul 
Cc: Seung-Woo Kim 
Cc: Shawn Guo 
Cc: Stefan Agner 
Cc: Steven Price 
Cc: Sumit Semwal 
Cc: Thierry Reding 
Cc: Tian Tao 
Cc: Tomeu Vizoso 
Cc: Tomi Valkeinen 
Cc: VMware Graphics 
Cc: Xinliang Liu 
Cc: Xinwei Kong 
Cc: Yannick Fertre 
Cc: Zack Rusin 
Reviewed-by: Daniel Vetter 
Signed-off-by: Maxime Ripard 

---

Changes from v2:
   - Take into account the feedback from Laurent and Lidiu to no longer
 force generic properties, but prefix vendor-specific properties with
 the vendor name


I'm pretty sure my r-b was without this ...


Yeah, sorry. I wanted to tell you on IRC that you wanted to have a
second look, but I shouldn't have kept it and caught you by surprise
indeed.


Why exactly do we need this? KMS is meant to be fairly generic (bugs
throw a wrench around here sometimes, and semantics can be tricky). If
we open up the door to yolo vendor properties in upstream, then that
goal is pretty much written off. And we've been there with vendor
properties, it's a giantic mess.

Minimally drop my r-b, I'm definitely not in support of this idea.


So the argument Lidiu and Laurent made was that in some cases, getting a
generic property right with only a couple of users is hard. So they
advocated for the right to keep non-generic properties. I can get the
argument, and no-one else said that was wrong, so it felt like the
consensus was there.


I also think that (maybe mainly on embedded side) we may have 1) 
esoteric HW features which perhaps can't even be made generic, and 2) 
features which may or may not be generic, but for which support cannot 
be added to any common opensource userspace projects like X or Weston, 
as the only use cases for the features are specialized low level apps 
(often customer's closed-source apps).


While I agree with Daniel's "gigantic mess" problem, it would also be 
quite nice to have a way to support all the HW features upstream instead 
of carrying them in vendor trees.


 Tomi


Re: [PULL] drm-misc-next

2021-06-10 Thread Thomas Zimmermann

Hi

Am 10.06.21 um 15:32 schrieb Daniel Vetter:

On Thu, Jun 10, 2021 at 1:15 PM Thomas Zimmermann  wrote:


Hi Dave and Daniel,

here's the second PR for drm-misc-next for this week, and the final one
for 5.14. I backmerged drm-next for the TTM changes. As for highlights
nouveau now has eDP backlight support and udmabuf supports huge pages.


Why did you do this backmerge? It's done now so nothing to fix, but
I'm not really seeing the reason - the backmerge is the last patch
right before you've done the pull request.


From what I understood, there was a TTM change (coming from intel-gt) 
that created significant conflicts between trees. I backmerged to get 
these changes into drm-misc-next. If the drm-next side was outdated, 
people shouldn't have to make patches against it.


Best regards
Thomas


-Daniel



Best regards
Thomas

drm-misc-next-2021-06-10:
drm-misc-next for 5.14:

UAPI Changes:

Cross-subsystem Changes:

  * dma-buf: Support huge pages in udmabuf

Core Changes:

  * Backmerge of drm/drm-next

  * drm/dp: Import eDP backlight code from i915

Driver Changes:

  * drm/bridge: TI SN65DSI83: Fix sparse warnings

  * drm/i915: Cleanup eDP backlight code before moving it into helper

  * drm/nouveau: Support DPCD backlights; Fix GEM init for internal BOs
The following changes since commit c707b73f0cfb1acc94a20389aecde65e6385349b:

   Merge tag 'amd-drm-next-5.14-2021-06-09' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2021-06-10 13:47:13 
+1000)

are available in the Git repository at:

   git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-next-2021-06-10

for you to fetch changes up to 86441fa29e57940eeb00f35fefb1853c1fbe67bb:

   Merge drm/drm-next into drm-misc-next (2021-06-10 12:18:54 +0200)


drm-misc-next for 5.14:

UAPI Changes:

Cross-subsystem Changes:

  * dma-buf: Support huge pages in udmabuf

Core Changes:

  * Backmerge of drm/drm-next

  * drm/dp: Import eDP backlight code from i915

Driver Changes:

  * drm/bridge: TI SN65DSI83: Fix sparse warnings

  * drm/i915: Cleanup eDP backlight code before moving it into helper

  * drm/nouveau: Support DPCD backlights; Fix GEM init for internal BOs


Christian König (1):
   drm/nouveau: init the base GEM fields for internal BOs

Lyude Paul (9):
   drm/i915/dpcd_bl: Remove redundant AUX backlight frequency calculations
   drm/i915/dpcd_bl: Handle drm_dpcd_read/write() return values correctly
   drm/i915/dpcd_bl: Cleanup intel_dp_aux_vesa_enable_backlight() a bit
   drm/i915/dpcd_bl: Cache some backlight capabilities in 
intel_panel.backlight
   drm/i915/dpcd_bl: Move VESA backlight enabling code closer together
   drm/i915/dpcd_bl: Return early in vesa_calc_max_backlight if we can't 
read PWMGEN_BIT_COUNT
   drm/i915/dpcd_bl: Print return codes for VESA backlight failures
   drm/dp: Extract i915's eDP backlight code into DRM helpers
   drm/nouveau/kms/nv50-: Add basic DPCD backlight support for nouveau

Marek Vasut (1):
   drm/bridge: ti-sn65dsi83: Fix sparse warnings

Thomas Zimmermann (1):
   Merge drm/drm-next into drm-misc-next

Vivek Kasireddy (1):
   udmabuf: Add support for mapping hugepages (v4)

  drivers/dma-buf/udmabuf.c  |  50 ++-
  drivers/gpu/drm/bridge/ti-sn65dsi83.c  |  21 +-
  drivers/gpu/drm/drm_dp_helper.c| 347 +
  drivers/gpu/drm/i915/display/intel_display_types.h |   2 +-
  .../gpu/drm/i915/display/intel_dp_aux_backlight.c  | 329 +++
  drivers/gpu/drm/nouveau/dispnv50/disp.c|  28 ++
  drivers/gpu/drm/nouveau/nouveau_backlight.c| 166 +-
  drivers/gpu/drm/nouveau/nouveau_bo.c   |   6 +
  drivers/gpu/drm/nouveau/nouveau_connector.h|   9 +-
  drivers/gpu/drm/nouveau/nouveau_encoder.h  |   1 +
  include/drm/drm_dp_helper.h|  48 +++
  11 files changed, 682 insertions(+), 325 deletions(-)

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer






--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] drm/i915: Add relocation exceptions for two other platforms

2021-06-10 Thread Zbigniew Kempczyński
We have established previously we stop using relocations starting
from gen12 platforms with Tigerlake as an exception. We keep this
statement but we want to enable relocations conditionally for
Alderlake S+P under require_force_probe flag set.

Keeping relocations under require_force_probe flag is interim solution
until IGTs will be rewritten to use softpin.

v2: - remove inline from function definition (Jani)
- fix indentation

v3: change to GRAPHICS_VER() (Zbigniew)

v4: remove RKL from flag as it is already shipped (Rodrigo)

Signed-off-by: Zbigniew Kempczyński 
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Cc: Rodrigo Vivi 
Acked-by: Dave Airlie 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 23 +++
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index a8abc9af5ff4..81064914640f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -491,16 +491,29 @@ eb_unreserve_vma(struct eb_vma *ev)
ev->flags &= ~__EXEC_OBJECT_RESERVED;
 }
 
+static bool platform_has_relocs_enabled(const struct i915_execbuffer *eb)
+{
+   /*
+* Relocations are disallowed starting from gen12 with Tigerlake
+* as an exception. We allow temporarily use relocations for Alderlake
+* when require_force_probe flag is set.
+*/
+   if (GRAPHICS_VER(eb->i915) < 12 || IS_TIGERLAKE(eb->i915))
+   return true;
+
+   if (INTEL_INFO(eb->i915)->require_force_probe &&
+   (IS_ALDERLAKE_S(eb->i915) || IS_ALDERLAKE_P(eb->i915)))
+   return true;
+
+   return false;
+}
+
 static int
 eb_validate_vma(struct i915_execbuffer *eb,
struct drm_i915_gem_exec_object2 *entry,
struct i915_vma *vma)
 {
-   /* Relocations are disallowed for all platforms after TGL-LP.  This
-* also covers all platforms with local memory.
-*/
-   if (entry->relocation_count &&
-   GRAPHICS_VER(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915))
+   if (entry->relocation_count && !platform_has_relocs_enabled(eb))
return -EINVAL;
 
if (unlikely(entry->flags & eb->invalid_flags))
-- 
2.26.0



Re: [Intel-gfx] [PATCH] drm/i915: Add relocation exceptions for two other platforms

2021-06-10 Thread Zbigniew Kempczyński
On Thu, Jun 10, 2021 at 10:36:12AM -0400, Rodrigo Vivi wrote:
> On Thu, Jun 10, 2021 at 12:39:55PM +0200, Zbigniew Kempczyński wrote:
> > We have established previously we stop using relocations starting
> > from gen12 platforms with Tigerlake as an exception. We keep this
> > statement but we want to enable relocations conditionally for
> > Rocketlake and Alderlake under require_force_probe flag set.
> > 
> > Keeping relocations under require_force_probe flag is interim solution
> > until IGTs will be rewritten to use softpin.
> 
> hmm... to be really honest I'm not so happy that we are introducing
> a new criteria to the force_probe.
> 
> The criteria was to have a functional driver and not to track uapi.
> 
> But on the other hand I do recognize that the current definition
> of the flag allows that, because we have established that with
> this behavior, the "driver for new Intel graphics devices that
> are recognized but not properly supported by this kernel version"
> (as stated in the Kconfig for the DRM_I915_FORCE_PROBE).
> 
> However...
> 
> > 
> > v2: - remove inline from function definition (Jani)
> > - fix indentation
> > 
> > v3: change to GRAPHICS_VER() (Zbigniew)
> > 
> > Signed-off-by: Zbigniew Kempczyński 
> > Cc: Dave Airlie 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Acked-by: Dave Airlie 
> > ---
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 24 +++
> >  1 file changed, 19 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index a8abc9af5ff4..30c4f0549ea0 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -491,16 +491,30 @@ eb_unreserve_vma(struct eb_vma *ev)
> > ev->flags &= ~__EXEC_OBJECT_RESERVED;
> >  }
> >  
> > +static bool platform_has_relocs_enabled(const struct i915_execbuffer *eb)
> > +{
> > +   /*
> > +* Relocations are disallowed starting from gen12 with Tigerlake
> > +* as an exception. We allow temporarily use relocations for Rocketlake
> > +* and Alderlake when require_force_probe flag is set.
> > +*/
> > +   if (GRAPHICS_VER(eb->i915) < 12 || IS_TIGERLAKE(eb->i915))
> > +   return true;
> > +
> > +   if (INTEL_INFO(eb->i915)->require_force_probe &&
> > +   (IS_ROCKETLAKE(eb->i915)
> 
> This ship has sailed... RKL is not protected by this flag any longer.
> Should this be on the TGL side now?

+Lucas

I think no, RKL has relocations disabled so we cannot put it to TGL side.
So if RKL is already released then putting it under require_force_probe 
flag is wrong and only I can do is to remove it from that condition. 
There's no option to unblock RKL on IGT CI until we rewrite all the tests.
We have to rely then on ADL* with require_force_probe flag to check how
ADL will work with relocations. 

> 
> >  || IS_ALDERLAKE_S(eb->i915) ||
> > +IS_ALDERLAKE_P(eb->i915)))
> 
> How to ensure that we will easily catch this when removing the
> flag?
> 
> I mean, should we have a GEM_BUG or drm_err message when these
> platforms in this list has not the required_force_probe?

I don't think we need GEM_BUG()/drm_err() - when IGT tests will support
both - reloc + no-reloc - then condition will be limited to:

if (GRAPHICS_VER(eb->i915) < 12 || IS_TIGERLAKE(eb->i915))
return true;
 
return false;

so require_force_probe condition will be deleted and we won't need it
anymore (IGTs will be ready).

--
Zbigniew

> 
> > +   return true;
> > +
> > +   return false;
> > +}
> > +
> >  static int
> >  eb_validate_vma(struct i915_execbuffer *eb,
> > struct drm_i915_gem_exec_object2 *entry,
> > struct i915_vma *vma)
> >  {
> > -   /* Relocations are disallowed for all platforms after TGL-LP.  This
> > -* also covers all platforms with local memory.
> > -*/
> > -   if (entry->relocation_count &&
> > -   GRAPHICS_VER(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915))
> > +   if (entry->relocation_count && !platform_has_relocs_enabled(eb))
> > return -EINVAL;
> >  
> > if (unlikely(entry->flags & eb->invalid_flags))
> > -- 
> > 2.26.0
> > 
> > ___
> > Intel-gfx mailing list
> > intel-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[PATCH] drm/i915/selftests: Reorder tasklet_disable vs local_bh_disable

2021-06-10 Thread Thomas Hellström
From: Chris Wilson 

Due to a change in requirements that disallows tasklet_disable() being
called from atomic context, rearrange the selftest to avoid doing so.

<3> [324.942939] BUG: sleeping function called from invalid context at 
kernel/softirq.c:888
<3> [324.942952] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 5601, 
name: i915_selftest
<4> [324.942960] 1 lock held by i915_selftest/5601:
<4> [324.942963]  #0: 888101d19240 (&dev->mutex){}-{3:3}, at: 
device_driver_attach+0x18/0x50
<3> [324.942987] Preemption disabled at:
<3> [324.942990] [] live_hold_reset.part.65+0xc2/0x2f0 [i915]
<4> [324.943255] CPU: 0 PID: 5601 Comm: i915_selftest Tainted: G U  
  5.13.0-rc5-CI-CI_DRM_10197+ #1
<4> [324.943259] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), 
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4> [324.943263] Call Trace:
<4> [324.943267]  dump_stack+0x7f/0xad
<4> [324.943276]  ___might_sleep.cold.123+0xf2/0x106
<4> [324.943286]  tasklet_unlock_wait+0x2e/0xb0
<4> [324.943291]  ? ktime_get_raw+0x81/0x120
<4> [324.943305]  live_hold_reset.part.65+0x1ab/0x2f0 [i915]
<4> [324.943500]  __i915_subtests.cold.7+0x42/0x92 [i915]
<4> [324.943723]  ? __i915_live_teardown+0x50/0x50 [i915]
<4> [324.943922]  ? __intel_gt_live_setup+0x30/0x30 [i915]

Fixes: da044747401fc ("tasklets: Replace spin wait in tasklet_unlock_wait()")
Signed-off-by: Chris Wilson 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gt/selftest_execlists.c | 55 
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index ea2203af0764..1c8108d30b85 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -551,6 +551,32 @@ static int live_pin_rewind(void *arg)
return err;
 }
 
+static int engine_lock_reset_tasklet(struct intel_engine_cs *engine)
+{
+   tasklet_disable(&engine->execlists.tasklet);
+   local_bh_disable();
+
+   if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
+&engine->gt->reset.flags)) {
+   local_bh_enable();
+   tasklet_enable(&engine->execlists.tasklet);
+
+   intel_gt_set_wedged(engine->gt);
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
+static void engine_unlock_reset_tasklet(struct intel_engine_cs *engine)
+{
+   clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+ &engine->gt->reset.flags);
+
+   local_bh_enable();
+   tasklet_enable(&engine->execlists.tasklet);
+}
+
 static int live_hold_reset(void *arg)
 {
struct intel_gt *gt = arg;
@@ -598,15 +624,9 @@ static int live_hold_reset(void *arg)
 
/* We have our request executing, now remove it and reset */
 
-   local_bh_disable();
-   if (test_and_set_bit(I915_RESET_ENGINE + id,
->->reset.flags)) {
-   local_bh_enable();
-   intel_gt_set_wedged(gt);
-   err = -EBUSY;
+   err = engine_lock_reset_tasklet(engine);
+   if (err)
goto out;
-   }
-   tasklet_disable(&engine->execlists.tasklet);
 
engine->execlists.tasklet.callback(&engine->execlists.tasklet);
GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
@@ -618,10 +638,7 @@ static int live_hold_reset(void *arg)
__intel_engine_reset_bh(engine, NULL);
GEM_BUG_ON(rq->fence.error != -EIO);
 
-   tasklet_enable(&engine->execlists.tasklet);
-   clear_and_wake_up_bit(I915_RESET_ENGINE + id,
- >->reset.flags);
-   local_bh_enable();
+   engine_unlock_reset_tasklet(engine);
 
/* Check that we do not resubmit the held request */
if (!i915_request_wait(rq, 0, HZ / 5)) {
@@ -4585,15 +4602,9 @@ static int reset_virtual_engine(struct intel_gt *gt,
GEM_BUG_ON(engine == ve->engine);
 
/* Take ownership of the reset and tasklet */
-   local_bh_disable();
-   if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
->->reset.flags)) {
-   local_bh_enable();
-   intel_gt_set_wedged(gt);
-   err = -EBUSY;
+   err = engine_lock_reset_tasklet(engine);
+   if (err)
goto out_heartbeat;
-   }
-   tasklet_disable(&engine->execlists.tasklet);
 
engine->execlists.tasklet.callback(&engine->execlists.tasklet);
GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
@@ -4612,9 +4623,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
GEM_BUG_ON(rq->fence.error != -EIO);
 
/* Release our grasp on the engine, letting CS flow again */
-   

Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-06-10 Thread Lionel Landwerlin

On 10/06/2021 23:46, john.c.harri...@intel.com wrote:

From: John Harrison 

Various UMDs need to know the L3 bank count. So add a query API for it.

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
  drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
  drivers/gpu/drm/i915/i915_query.c  | 22 ++
  drivers/gpu/drm/i915/i915_reg.h|  1 +
  include/uapi/drm/i915_drm.h|  1 +
  5 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 2161bf01ef8b..708bb3581d83 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info,
  
  	intel_sseu_dump(&info->sseu, p);

  }
+
+int intel_gt_get_l3bank_count(struct intel_gt *gt)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_wakeref_t wakeref;
+   u32 fuse3;
+
+   if (GRAPHICS_VER(i915) < 12)
+   return -ENODEV;
+
+   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+   fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
+
+   return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 7ec395cace69..46aa1cf4cf30 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
*gt)
  
  void intel_gt_info_print(const struct intel_gt_info *info,

 struct drm_printer *p);
+int intel_gt_get_l3bank_count(struct intel_gt *gt);
  
  void intel_gt_watchdog_work(struct work_struct *work);
  
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c

index 96bd8fb3e895..0e92bb2d21b2 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -10,6 +10,7 @@
  #include "i915_perf.h"
  #include "i915_query.h"
  #include 
+#include "gt/intel_gt.h"
  
  static int copy_query_item(void *query_hdr, size_t query_sz,

   u32 total_length,
@@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
*i915,
return hwconfig->size;
  }
  
+static int query_l3banks(struct drm_i915_private *i915,

+struct drm_i915_query_item *query_item)
+{
+   u32 banks;
+
+   if (query_item->length == 0)
+   return sizeof(banks);
+
+   if (query_item->length < sizeof(banks))
+   return -EINVAL;
+
+   banks = intel_gt_get_l3bank_count(&i915->gt);
+
+   if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
+&banks, sizeof(banks)))
+   return -EFAULT;
+
+   return sizeof(banks);
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_perf_config,
query_memregion_info,
query_hwconfig_table,
+   query_l3banks,
  };
  
  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index eb13c601d680..e9ba88fe3db7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  #define   GEN10_MIRROR_FUSE3  _MMIO(0x9118)
  #define GEN10_L3BANK_PAIR_COUNT 4
  #define GEN10_L3BANK_MASK   0x0F
+#define GEN12_GT_L3_MODE_MASK 0xFF
  
  #define GEN8_EU_DISABLE0		_MMIO(0x9134)

  #define   GEN8_EU_DIS0_S0_MASK0xff
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 87d369cae22a..20d18cca5066 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_PERF_CONFIG  3
  #define DRM_I915_QUERY_MEMORY_REGIONS   4
  #define DRM_I915_QUERY_HWCONFIG_TABLE   5
+#define DRM_I915_QUERY_L3_BANK_COUNT6



A little bit of documentation about the format of the return data would 
be nice :)



-Lionel



  /* Must be kept compact -- no holes and well documented */
  
  	/**





Re: [PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Maxime Ripard
Hi,

On Thu, Jun 10, 2021 at 11:00:05PM +0200, Daniel Vetter wrote:
> On Thu, Jun 10, 2021 at 7:47 PM Maxime Ripard  wrote:
> >
> > New KMS properties come with a bunch of requirements to avoid each
> > driver from running their own, inconsistent, set of properties,
> > eventually leading to issues like property conflicts, inconsistencies
> > between drivers and semantics, etc.
> >
> > Let's document what we expect.
> >
> > Cc: Alexandre Belloni 
> > Cc: Alexandre Torgue 
> > Cc: Alex Deucher 
> > Cc: Alison Wang 
> > Cc: Alyssa Rosenzweig 
> > Cc: Andrew Jeffery 
> > Cc: Andrzej Hajda 
> > Cc: Anitha Chrisanthus 
> > Cc: Benjamin Gaignard 
> > Cc: Ben Skeggs 
> > Cc: Boris Brezillon 
> > Cc: Brian Starkey 
> > Cc: Chen Feng 
> > Cc: Chen-Yu Tsai 
> > Cc: Christian Gmeiner 
> > Cc: "Christian König" 
> > Cc: Chun-Kuang Hu 
> > Cc: Edmund Dea 
> > Cc: Eric Anholt 
> > Cc: Fabio Estevam 
> > Cc: Gerd Hoffmann 
> > Cc: Haneen Mohammed 
> > Cc: Hans de Goede 
> > Cc: "Heiko Stübner" 
> > Cc: Huang Rui 
> > Cc: Hyun Kwon 
> > Cc: Inki Dae 
> > Cc: Jani Nikula 
> > Cc: Jernej Skrabec 
> > Cc: Jerome Brunet 
> > Cc: Joel Stanley 
> > Cc: John Stultz 
> > Cc: Jonas Karlman 
> > Cc: Jonathan Hunter 
> > Cc: Joonas Lahtinen 
> > Cc: Joonyoung Shim 
> > Cc: Jyri Sarha 
> > Cc: Kevin Hilman 
> > Cc: Kieran Bingham 
> > Cc: Krzysztof Kozlowski 
> > Cc: Kyungmin Park 
> > Cc: Laurent Pinchart 
> > Cc: Linus Walleij 
> > Cc: Liviu Dudau 
> > Cc: Lucas Stach 
> > Cc: Ludovic Desroches 
> > Cc: Marek Vasut 
> > Cc: Martin Blumenstingl 
> > Cc: Matthias Brugger 
> > Cc: Maxime Coquelin 
> > Cc: Maxime Ripard 
> > Cc: Melissa Wen 
> > Cc: Neil Armstrong 
> > Cc: Nicolas Ferre 
> > Cc: "Noralf Trønnes" 
> > Cc: NXP Linux Team 
> > Cc: Oleksandr Andrushchenko 
> > Cc: Patrik Jakobsson 
> > Cc: Paul Cercueil 
> > Cc: Pengutronix Kernel Team 
> > Cc: Philippe Cornu 
> > Cc: Philipp Zabel 
> > Cc: Qiang Yu 
> > Cc: Rob Clark 
> > Cc: Robert Foss 
> > Cc: Rob Herring 
> > Cc: Rodrigo Siqueira 
> > Cc: Rodrigo Vivi 
> > Cc: Roland Scheidegger 
> > Cc: Russell King 
> > Cc: Sam Ravnborg 
> > Cc: Sandy Huang 
> > Cc: Sascha Hauer 
> > Cc: Sean Paul 
> > Cc: Seung-Woo Kim 
> > Cc: Shawn Guo 
> > Cc: Stefan Agner 
> > Cc: Steven Price 
> > Cc: Sumit Semwal 
> > Cc: Thierry Reding 
> > Cc: Tian Tao 
> > Cc: Tomeu Vizoso 
> > Cc: Tomi Valkeinen 
> > Cc: VMware Graphics 
> > Cc: Xinliang Liu 
> > Cc: Xinwei Kong 
> > Cc: Yannick Fertre 
> > Cc: Zack Rusin 
> > Reviewed-by: Daniel Vetter 
> > Signed-off-by: Maxime Ripard 
> >
> > ---
> >
> > Changes from v2:
> >   - Take into account the feedback from Laurent and Lidiu to no longer
> > force generic properties, but prefix vendor-specific properties with
> > the vendor name
> 
> I'm pretty sure my r-b was without this ...

Yeah, sorry. I wanted to tell you on IRC that you wanted to have a
second look, but I shouldn't have kept it and caught you by surprise
indeed.

> Why exactly do we need this? KMS is meant to be fairly generic (bugs
> throw a wrench around here sometimes, and semantics can be tricky). If
> we open up the door to yolo vendor properties in upstream, then that
> goal is pretty much written off. And we've been there with vendor
> properties, it's a giantic mess.
> 
> Minimally drop my r-b, I'm definitely not in support of this idea.

So the argument Lidiu and Laurent made was that in some cases, getting a
generic property right with only a couple of users is hard. So they
advocated for the right to keep non-generic properties. I can get the
argument, and no-one else said that was wrong, so it felt like the
consensus was there.

> If there's a strong consensus that we really need this then I'm not
> going to nack this, but this really needs a pile of acks from
> compositor folks that they're willing to live with the resulting
> fallout this will likely bring. Your cc list seems to have an absence
> of compositor folks, but instead every driver maintainer. That's
> backwards. We make uapi for userspace, not for kernel driver
> maintainers!

Right, but it's mostly about in-kernel rules though? And you're the one
who mentionned CC'ing the driver maintainers in the first iteration?

> ltdr; I'd go back to v2. And then cc compositor folks on this to get
> their ack.

So, Pekka, Simon, is there anyone else I should Cc?

Thanks!
Maxime


signature.asc
Description: PGP signature


Re: [PATCH 1/4] drm/ttm: add a pointer to the allocating BO into ttm_resource

2021-06-10 Thread Intel

Hi, Christian,

I know you have a lot on your plate, and that the drm community is a bit 
lax about following the kernel patch submitting guidelines, but now that 
we're also spinning up a number of Intel developers on TTM could we 
please make a better effort with cover letters and commit messages so 
that they understand what the purpose and end goal of the series is. A 
reviewer shouldn't have to look at the last patch to try to get an 
understanding what the series is doing and why.


On 6/10/21 1:05 PM, Christian König wrote:

We are going to need this for the next patch




and it allows us to clean
up amdgpu as well.


The amdgpu changes are not reflected in the commit title.




Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 47 -
  drivers/gpu/drm/ttm/ttm_resource.c  |  1 +
  include/drm/ttm/ttm_resource.h  |  1 +
  3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 194f9eecf89c..8e3f5da44e4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -26,23 +26,12 @@
  
  #include "amdgpu.h"
  
-struct amdgpu_gtt_node {

-   struct ttm_buffer_object *tbo;
-   struct ttm_range_mgr_node base;
-};
-
  static inline struct amdgpu_gtt_mgr *
  to_gtt_mgr(struct ttm_resource_manager *man)
  {
return container_of(man, struct amdgpu_gtt_mgr, manager);
  }
  
-static inline struct amdgpu_gtt_node *

-to_amdgpu_gtt_node(struct ttm_resource *res)
-{
-   return container_of(res, struct amdgpu_gtt_node, base.base);
-}
-
  /**
   * DOC: mem_info_gtt_total
   *
@@ -107,9 +96,9 @@ const struct attribute_group amdgpu_gtt_mgr_attr_group = {
   */
  bool amdgpu_gtt_mgr_has_gart_addr(struct ttm_resource *res)
  {
-   struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res);
+   struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res);
  
-	return drm_mm_node_allocated(&node->base.mm_nodes[0]);

+   return drm_mm_node_allocated(&node->mm_nodes[0]);
  }
  
  /**

@@ -129,7 +118,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
uint32_t num_pages = PFN_UP(tbo->base.size);
-   struct amdgpu_gtt_node *node;
+   struct ttm_range_mgr_node *node;
int r;
  
  	spin_lock(&mgr->lock);

@@ -141,19 +130,17 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
atomic64_sub(num_pages, &mgr->available);
spin_unlock(&mgr->lock);
  
-	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);

+   node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL);
if (!node) {
r = -ENOMEM;
goto err_out;
}
  
-	node->tbo = tbo;

-   ttm_resource_init(tbo, place, &node->base.base);
-
+   ttm_resource_init(tbo, place, &node->base);
if (place->lpfn) {
spin_lock(&mgr->lock);
r = drm_mm_insert_node_in_range(&mgr->mm,
-   &node->base.mm_nodes[0],
+   &node->mm_nodes[0],
num_pages, tbo->page_alignment,
0, place->fpfn, place->lpfn,
DRM_MM_INSERT_BEST);
@@ -161,14 +148,14 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (unlikely(r))
goto err_free;
  
-		node->base.base.start = node->base.mm_nodes[0].start;

+   node->base.start = node->mm_nodes[0].start;
} else {
-   node->base.mm_nodes[0].start = 0;
-   node->base.mm_nodes[0].size = node->base.base.num_pages;
-   node->base.base.start = AMDGPU_BO_INVALID_OFFSET;
+   node->mm_nodes[0].start = 0;
+   node->mm_nodes[0].size = node->base.num_pages;
+   node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
  
-	*res = &node->base.base;

+   *res = &node->base;
return 0;
  
  err_free:

@@ -191,12 +178,12 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
   struct ttm_resource *res)
  {
-   struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res);
+   struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res);
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
  
  	spin_lock(&mgr->lock);

-   if (drm_mm_node_allocated(&node->base.mm_nodes[0]))
-   drm_mm_remove_node(&node->base.mm_nodes[0]);
+   if (drm_mm_node_allocated(&node->mm_nodes[0]))
+   drm_mm_remove_node(&node->mm_nodes[0]);
spin_unlock(&mgr->lock);
atomic64_add(res->num_pages, &mgr->a

Re: linux-next: Tree for Jun 10 (drivers/gpu/drm/nouveau/dispnv50/disp.c)

2021-06-10 Thread Randy Dunlap

On 6/10/21 2:48 AM, Stephen Rothwell wrote:

Hi all,

Changes since 20210609:



on x86_64:

../drivers/gpu/drm/nouveau/dispnv50/disp.c: In function 
‘nv50_sor_atomic_disable’:
../drivers/gpu/drm/nouveau/dispnv50/disp.c:1665:52: error: ‘struct 
nouveau_connector’ has no member named ‘backlight’

  struct nouveau_backlight *backlight = nv_connector->backlight;
^~
../drivers/gpu/drm/nouveau/dispnv50/disp.c:1670:28: error: dereferencing 
pointer to incomplete type ‘struct nouveau_backlight’

  if (backlight && backlight->uses_dpcd) {
^~

Full randconfig fil is attached.




config-r8561.gz
Description: application/gzip


Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Alistair Popple
On Friday, 11 June 2021 11:00:34 AM AEST Peter Xu wrote:
> On Fri, Jun 11, 2021 at 09:17:14AM +1000, Alistair Popple wrote:
> > On Friday, 11 June 2021 9:04:19 AM AEST Peter Xu wrote:
> > > On Fri, Jun 11, 2021 at 12:21:26AM +1000, Alistair Popple wrote:
> > > > > Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect 
> > > > > to explicit
> > > > > call split_huge_pmd_address(), afaict.  Since both of them use 
> > > > > __split_huge_pmd()
> > > > > internally which will generate that unwanted CLEAR notify.
> > > >
> > > > Agree that gup calls __split_huge_pmd() via split_huge_pmd_address()
> > > > which will always CLEAR. However gup only calls 
> > > > split_huge_pmd_address() if it
> > > > finds a thp pmd. In follow_pmd_mask() we have:
> > > >
> > > >   if (likely(!pmd_trans_huge(pmdval)))
> > > >   return follow_page_pte(vma, address, pmd, flags, 
> > > > &ctx->pgmap);
> > > >
> > > > So I don't think we have a problem here.
> > >
> > > Sorry I didn't follow here..  We do FOLL_SPLIT_PMD after this check, 
> > > right?  I
> > > mean, if it's a thp for the current mm, afaict pmd_trans_huge() should 
> > > return
> > > true above, so we'll skip follow_page_pte(); then we'll check 
> > > FOLL_SPLIT_PMD
> > > and do the split, then the CLEAR notify.  Hmm.. Did I miss something?
> >
> > That seems correct - if the thp is not mapped with a pmd we won't split and 
> > we
> > won't CLEAR. If there is a thp pmd we will split and CLEAR, but in that 
> > case it
> > is fine - we will retry, but the retry will won't CLEAR because the pmd has
> > already been split.
> 
> Aha!
> 
> >
> > The issue arises with doing it unconditionally in make device exclusive is 
> > that
> > you *always* CLEAR even if there is no thp pmd to split. Or at least that's 
> > my
> > understanding, please let me know if it doesn't make sense.
> 
> Exactly.  But if you see what I meant here, even if it can work like this, it
> sounds still fragile, isn't it?  I just feel something is slightly off there..
> 
> IMHO split_huge_pmd() checked pmd before calling __split_huge_pmd() for
> performance, afaict, because if it's not a thp even without locking, then it
> won't be, so further __split_huge_pmd() is not necessary.
> 
> IOW, it's very legal if someday we'd like to let split_huge_pmd() call
> __split_huge_pmd() directly, then AFAIU device exclusive API will be the 1st
> one to be broken with that seems-to-be-irrelevant change I'm afraid..

Well I would argue the performance of memory notifiers is becoming increasingly
important, and a change that causes them to be called unnecessarily is
therefore not very legal. Likely the correct fix here is to optimise
__split_huge_pmd() to only call the notifier if it's actually going to split a
pmd. As you said though that's a completely different story which I think would
be best done as a separate series.

> This lets me goes back a step to think about why do we need this notifier at
> all to cover this whole range of make_device_exclusive() procedure..
> 
> What I am thinking is, we're afraid some CPU accesses this page so the pte got
> quickly restored when device atomic operation is carrying on.  Then with this
> notifier we'll be able to cancel it.  Makes perfect sense.
> 
> However do we really need to register this notifier so early?  The thing is 
> the
> GPU driver still has all the page locks, so even if there's a race to restore
> the ptes, they'll block at taking the page lock until the driver releases it.
> 
> IOW, I'm wondering whether the "non-fragile" way to do this is not do
> mmu_interval_notifier_insert() that early: what if we register that notifier
> after make_device_exclusive_range() returns but before page_unlock() somehow?
> So before page_unlock(), race is protected fully by the lock itself; after
> that, it's done by mmu notifier.  Then maybe we don't need to worry about all
> these notifications during marking exclusive (while we shouldn't)?

The notifier is needed to protect against races with pte changes. Once a page
has been marked for exclusive access the driver will update it's page tables to
allow atomic access to the page. However in the meantime the page could become
unmapped entirely or write protected.

As I understand things the page lock won't protect against these kind of pte
changes, hence the need for mmu_interval_read_begin/retry which allows the
driver to hold a mutex protecting against invalidations via blocking the
notifier until the device page tables have been updated.

> Sorry in advance if I overlooked anything as I know little on device side 
> (even
> less than mm itself).  Also sorry to know that this series got marked
> to-be-update in -mm; hopefully it'll still land soon even if it still needs
> some rebase to other more important bugfixes - I definitely jumped in too late
> even if to mess this all up. :-)

I was thinking that was probably coming anyway, but I'm still hoping it will be
just a rebase on Hugh's work 

[git pull] drm fixes for 5.13-rc6

2021-06-10 Thread Dave Airlie
Hey Linus,

Another week of fixes, nothing too crazy, but a few all over the
place, two locking fixes in the core/ttm area, a couple of small
driver fixes (radeon, sun4i, mcde, vc4). Then msm and amdgpu have a
set of fixes each, mostly for smaller things, though the msm has a DSI
fix for a black screen. I haven't seen any intel fixes next week so
they may have a few that may or may not wait for next week.

Dave.

drm-fixes-2021-06-11:
drm fixes for 5.13-rc6

drm:
- auth locking fix

ttm:
- locking fix

amdgpu:
- Use kvzmalloc in amdgu_bo_create
- Use drm_dbg_kms for reporting failure to get a GEM FB
- Fix some register offsets for Sienna Cichlid
- Fix fall-through warning

radeon:
- memcpy_to/from_io fixes

msm:
- NULL ptr deref fix
- CP_PROTECT reg programming fix
- incorrect register shift fix
- DSI blank screen fix

sun4i:
- hdmi output probing fix

mcde:
- DSI pipeline calc fix

vc4:
- out of bounds fix
The following changes since commit 614124bea77e452aa6df7a8714e8bc820b489922:

  Linux 5.13-rc5 (2021-06-06 15:47:27 -0700)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-06-11

for you to fetch changes up to 7de5c0d70c779454785dd2431707df5b841eaeaf:

  Merge tag 'amd-drm-fixes-5.13-2021-06-09' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes (2021-06-11
11:17:10 +1000)


drm fixes for 5.13-rc6

drm:
- auth locking fix

ttm:
- locking fix

amdgpu:
- Use kvzmalloc in amdgu_bo_create
- Use drm_dbg_kms for reporting failure to get a GEM FB
- Fix some register offsets for Sienna Cichlid
- Fix fall-through warning

radeon:
- memcpy_to/from_io fixes

msm:
- NULL ptr deref fix
- CP_PROTECT reg programming fix
- incorrect register shift fix
- DSI blank screen fix

sun4i:
- hdmi output probing fix

mcde:
- DSI pipeline calc fix

vc4:
- out of bounds fix


Alexey Minnekhanov (1):
  drm/msm: Init mm_list before accessing it for use_vram path

Changfeng (1):
  drm/amdgpu: switch kzalloc to kvzalloc in amdgpu_bo_create

Chen Li (1):
  radeon: use memcpy_to/fromio for UVD fw upload

Christian König (1):
  drm/ttm: fix deref of bo->ttm without holding the lock v2

Dave Airlie (3):
  Merge tag 'drm-msm-fixes-2021-06-10' of
https://gitlab.freedesktop.org/drm/msm into drm-fixes
  Merge tag 'drm-misc-fixes-2021-06-10' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
  Merge tag 'amd-drm-fixes-5.13-2021-06-09' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

Desmond Cheong Zhi Xi (2):
  drm: Fix use-after-free read in drm_getunique()
  drm: Lock pointer access in drm_master_release()

Gustavo A. R. Silva (1):
  drm/amd/pm: Fix fall-through warning for Clang

Jonathan Marek (3):
  drm/msm/a6xx: update/fix CP_PROTECT initialization
  drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
  drm/msm/a6xx: avoid shadow NULL reference in failure path

Linus Walleij (1):
  drm/mcde: Fix off by 10^3 in calculation

Mark Rutland (1):
  drm/vc4: fix vc4_atomic_commit_tail() logic

Michel Dänzer (1):
  drm/amdgpu: Use drm_dbg_kms for reporting failure to get a GEM FB

Rohit Khaire (1):
  drm/amdgpu: Fix incorrect register offsets for Sienna Cichlid

Saravana Kannan (1):
  drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device

Stephen Boyd (1):
  drm/msm/dsi: Stash away calculated vco frequency on recalc

 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  26 +++-
 .../gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c   |   1 +
 drivers/gpu/drm/drm_auth.c |   3 +-
 drivers/gpu/drm/drm_ioctl.c|   9 +-
 drivers/gpu/drm/mcde/mcde_dsi.c|   2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 155 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   2 +-
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_10nm.c |   1 +
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c  |   1 +
 drivers/gpu/drm/msm/msm_gem.c  |   7 +
 drivers/gpu/drm/radeon/radeon_uvd.c|   4 +-
 drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c  |  31 -
 drivers/gpu/drm/sun4i/sun8i_dw_hdmi.h  |   5 +-
 drivers/gpu/drm/sun4i/sun8i_hdmi_phy.c |  41 +-
 drivers/gpu/drm/ttm/ttm_bo.c   |   5 +-
 drivers/gpu/drm/ttm/ttm_device.c   |   8 +-
 drivers/gpu/drm/vc4/vc4_kms.c  |   2 +-
 19 files changed, 232 insertions(+), 79 deletions(-)


Re: [PATCH] drm: Lock pointer access in drm_master_release()

2021-06-10 Thread Desmond Cheong Zhi Xi

On 11/6/21 1:49 am, Emil Velikov wrote:

On Thu, 10 Jun 2021 at 11:10, Daniel Vetter  wrote:


On Wed, Jun 09, 2021 at 05:21:19PM +0800, Desmond Cheong Zhi Xi wrote:

This patch eliminates the following smatch warning:
drivers/gpu/drm/drm_auth.c:320 drm_master_release() warn: unlocked access 'master' 
(line 318) expected lock '&dev->master_mutex'

The 'file_priv->master' field should be protected by the mutex lock to
'&dev->master_mutex'. This is because other processes can concurrently
modify this field and free the current 'file_priv->master'
pointer. This could result in a use-after-free error when 'master' is
dereferenced in subsequent function calls to
'drm_legacy_lock_master_cleanup()' or to 'drm_lease_revoke()'.

An example of a scenario that would produce this error can be seen
from a similar bug in 'drm_getunique()' that was reported by Syzbot:
https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f803

In the Syzbot report, another process concurrently acquired the
device's master mutex in 'drm_setmaster_ioctl()', then overwrote
'fpriv->master' in 'drm_new_set_master()'. The old value of
'fpriv->master' was subsequently freed before the mutex was unlocked.

Reported-by: Dan Carpenter 
Signed-off-by: Desmond Cheong Zhi Xi 


Thanks a lot. I've done an audit of this code, and I found another
potential problem in drm_is_current_master. The callers from drm_auth.c
hold the dev->master_mutex, but all the external ones dont. I think we
need to split this into a _locked function for use within drm_auth.c, and
the exported one needs to grab the dev->master_mutex while it's checking
master status. Ofc there will still be races, those are ok, but right now
we run the risk of use-after free problems in drm_lease_owner.


Note that some code does acquire the mutex via
drm_master_internal_acquire - so we should be careful.
As mentioned elsewhere - having a _locked version of
drm_is_current_master sounds good.

Might as well throw a lockdep_assert_held_once in there just in case :-P

Happy to help review the follow-up patches.
-Emil



Thanks for the advice, Emil!

I did a preliminary check on the code that calls 
drm_master_internal_acquire in drm_client_modeset.c and drm_fb_helper.c, 
and it doesn't seem like they eventually call drm_is_current_master. So 
we should be good on that front.


lockdep_assert_held_once sounds good :)

Best wishes,
Desmond



[GIT PULL] exynos-drm-next

2021-06-10 Thread Inki Dae
Hi Dave,

   Just two cleanups to replace pm_runtime_get_sync() with
   pm_runtime_resume_and_get().

   Please kinkdly let me know if there is any problem.

Thanks,
Inki Dae

The following changes since commit c707b73f0cfb1acc94a20389aecde65e6385349b:

  Merge tag 'amd-drm-next-5.14-2021-06-09' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2021-06-10 13:47:13 
+1000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos 
tags/exynos-drm-next-for-v5.14

for you to fetch changes up to 445d3bed75de4082c7c7794030ac9a5b8bfde886:

  drm/exynos: use pm_runtime_resume_and_get() (2021-06-11 10:56:38 +0900)


Two cleanups
- These patches make Exynos DRM driver to use pm_runtime_resume_and_get()
  function instead of m_runtime_get_sync() to deal with usage counter.
  pm_runtime_get_sync() increases the usage counter even when it failed,
  which could make callers to forget to decrease the usage counter.
  pm_runtime_resume_and_get() decreases the usage counter regardless of
  whether it failed or not.


Inki Dae (1):
  drm/exynos: use pm_runtime_resume_and_get()

Tian Tao (1):
  drm/exynos: Use pm_runtime_resume_and_get() to replace open coding

 drivers/gpu/drm/exynos/exynos5433_drm_decon.c |  7 ++-
 drivers/gpu/drm/exynos/exynos7_drm_decon.c|  7 ++-
 drivers/gpu/drm/exynos/exynos_drm_dsi.c   |  7 ++-
 drivers/gpu/drm/exynos/exynos_drm_fimc.c  |  8 +++-
 drivers/gpu/drm/exynos/exynos_drm_fimd.c  | 25 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   |  9 -
 drivers/gpu/drm/exynos/exynos_drm_gsc.c   |  7 ++-
 drivers/gpu/drm/exynos/exynos_drm_mic.c   |  6 ++
 drivers/gpu/drm/exynos/exynos_drm_rotator.c   |  7 ++-
 drivers/gpu/drm/exynos/exynos_drm_scaler.c| 10 ++
 drivers/gpu/drm/exynos/exynos_hdmi.c  |  8 +++-
 drivers/gpu/drm/exynos/exynos_mixer.c |  7 ++-
 12 files changed, 86 insertions(+), 22 deletions(-)


Re: [PATCH] drm: Lock pointer access in drm_master_release()

2021-06-10 Thread Desmond Cheong Zhi Xi

On 11/6/21 12:48 am, Daniel Vetter wrote:

On Thu, Jun 10, 2021 at 11:21:39PM +0800, Desmond Cheong Zhi Xi wrote:

On 10/6/21 6:10 pm, Daniel Vetter wrote:

On Wed, Jun 09, 2021 at 05:21:19PM +0800, Desmond Cheong Zhi Xi wrote:

This patch eliminates the following smatch warning:
drivers/gpu/drm/drm_auth.c:320 drm_master_release() warn: unlocked access 'master' 
(line 318) expected lock '&dev->master_mutex'

The 'file_priv->master' field should be protected by the mutex lock to
'&dev->master_mutex'. This is because other processes can concurrently
modify this field and free the current 'file_priv->master'
pointer. This could result in a use-after-free error when 'master' is
dereferenced in subsequent function calls to
'drm_legacy_lock_master_cleanup()' or to 'drm_lease_revoke()'.

An example of a scenario that would produce this error can be seen
from a similar bug in 'drm_getunique()' that was reported by Syzbot:
https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f803

In the Syzbot report, another process concurrently acquired the
device's master mutex in 'drm_setmaster_ioctl()', then overwrote
'fpriv->master' in 'drm_new_set_master()'. The old value of
'fpriv->master' was subsequently freed before the mutex was unlocked.

Reported-by: Dan Carpenter 
Signed-off-by: Desmond Cheong Zhi Xi 


Thanks a lot. I've done an audit of this code, and I found another
potential problem in drm_is_current_master. The callers from drm_auth.c
hold the dev->master_mutex, but all the external ones dont. I think we
need to split this into a _locked function for use within drm_auth.c, and
the exported one needs to grab the dev->master_mutex while it's checking
master status. Ofc there will still be races, those are ok, but right now
we run the risk of use-after free problems in drm_lease_owner.

Are you up to do that fix too?



Hi Daniel,

Thanks for the pointer, I'm definitely up for it!


I think the drm_lease.c code also needs an audit, there we'd need to make
sure that we hold hold either the lock or a full master reference to avoid
the use-after-free issues here.



I'd be happy to look into drm_lease.c as well.


Patch merged to drm-misc-fixes with cc: stable.
-Daniel


---
   drivers/gpu/drm/drm_auth.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index f00e5abdbbf4..b59b26a71ad5 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -315,9 +315,10 @@ int drm_master_open(struct drm_file *file_priv)
   void drm_master_release(struct drm_file *file_priv)
   {
struct drm_device *dev = file_priv->minor->dev;
-   struct drm_master *master = file_priv->master;
+   struct drm_master *master;

mutex_lock(&dev->master_mutex);
+   master = file_priv->master;
if (file_priv->magic)
idr_remove(&file_priv->master->magic_map, file_priv->magic);
--
2.25.1





 From what I can see, there are other places in the kernel that could use the
_locked version of drm_is_current_master as well, such as drm_mode_getfb in
drm_framebuffer.c. I'll take a closer look, and if the changes make sense
I'll prepare a patch series for them.


Oh maybe we have a naming confusion: the _locked is the one where the
caller must grab the lock already, whereas drm_is_current_master would
grab the master_mutex internally to do the check. The one in
drm_framebuffer.c looks like it'd need the internal one since there's no
other need to grab the master_mutex.
-Daniel



Ah ok got it, I think I confused myself earlier.

Just to check, may I include you in a Reported-by: tag?


Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Peter Xu
On Fri, Jun 11, 2021 at 09:17:14AM +1000, Alistair Popple wrote:
> On Friday, 11 June 2021 9:04:19 AM AEST Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, Jun 11, 2021 at 12:21:26AM +1000, Alistair Popple wrote:
> > > > Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to 
> > > > explicit
> > > > call split_huge_pmd_address(), afaict.  Since both of them use 
> > > > __split_huge_pmd()
> > > > internally which will generate that unwanted CLEAR notify.
> > >
> > > Agree that gup calls __split_huge_pmd() via split_huge_pmd_address()
> > > which will always CLEAR. However gup only calls split_huge_pmd_address() 
> > > if it
> > > finds a thp pmd. In follow_pmd_mask() we have:
> > >
> > >   if (likely(!pmd_trans_huge(pmdval)))
> > >   return follow_page_pte(vma, address, pmd, flags, 
> > > &ctx->pgmap);
> > >
> > > So I don't think we have a problem here.
> > 
> > Sorry I didn't follow here..  We do FOLL_SPLIT_PMD after this check, right? 
> >  I
> > mean, if it's a thp for the current mm, afaict pmd_trans_huge() should 
> > return
> > true above, so we'll skip follow_page_pte(); then we'll check FOLL_SPLIT_PMD
> > and do the split, then the CLEAR notify.  Hmm.. Did I miss something?
> 
> That seems correct - if the thp is not mapped with a pmd we won't split and we
> won't CLEAR. If there is a thp pmd we will split and CLEAR, but in that case 
> it
> is fine - we will retry, but the retry will won't CLEAR because the pmd has
> already been split.

Aha!

> 
> The issue arises with doing it unconditionally in make device exclusive is 
> that
> you *always* CLEAR even if there is no thp pmd to split. Or at least that's my
> understanding, please let me know if it doesn't make sense.

Exactly.  But if you see what I meant here, even if it can work like this, it
sounds still fragile, isn't it?  I just feel something is slightly off there..

IMHO split_huge_pmd() checked pmd before calling __split_huge_pmd() for
performance, afaict, because if it's not a thp even without locking, then it
won't be, so further __split_huge_pmd() is not necessary.

IOW, it's very legal if someday we'd like to let split_huge_pmd() call
__split_huge_pmd() directly, then AFAIU device exclusive API will be the 1st
one to be broken with that seems-to-be-irrelevant change I'm afraid..

This lets me goes back a step to think about why do we need this notifier at
all to cover this whole range of make_device_exclusive() procedure..

What I am thinking is, we're afraid some CPU accesses this page so the pte got
quickly restored when device atomic operation is carrying on.  Then with this
notifier we'll be able to cancel it.  Makes perfect sense.

However do we really need to register this notifier so early?  The thing is the
GPU driver still has all the page locks, so even if there's a race to restore
the ptes, they'll block at taking the page lock until the driver releases it.

IOW, I'm wondering whether the "non-fragile" way to do this is not do
mmu_interval_notifier_insert() that early: what if we register that notifier
after make_device_exclusive_range() returns but before page_unlock() somehow?
So before page_unlock(), race is protected fully by the lock itself; after
that, it's done by mmu notifier.  Then maybe we don't need to worry about all
these notifications during marking exclusive (while we shouldn't)?

Sorry in advance if I overlooked anything as I know little on device side (even
less than mm itself).  Also sorry to know that this series got marked
to-be-update in -mm; hopefully it'll still land soon even if it still needs
some rebase to other more important bugfixes - I definitely jumped in too late
even if to mess this all up. :-)

-- 
Peter Xu



[PATCH 0/1] Relax CTB response timeout

2021-06-10 Thread Matthew Brost
A previous version of patch [1] was NACK'd because it introduced a
Kconfig option. We agreed on a larger timeout value if problems were
shown with the current timeout value. A problem was shown in CI [2],
let's increase the timeout.

[1] https://patchwork.freedesktop.org/patch/436623/
[2] 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7775/fi-cfl-8700k/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-a.html#dmesg-warnings385

Signed-off-by: Matthew Brost 

Matthew Brost (1):
  drm/i915/guc: Relax CTB response timeout

 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.28.0



[PATCH 1/1] drm/i915/guc: Relax CTB response timeout

2021-06-10 Thread Matthew Brost
In upcoming patch we will allow more CTB requests to be sent in
parallel to the GuC for processing, so we shouldn't assume any more
that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option
v3:
 (Daniel Vetter)
  - Use hardcoded value of 1s rather than config option

Signed-off-by: Matthew Brost 
Cc: Michal Wajdeczko 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 8f7b148fef58..bc626ca0a9eb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -475,12 +475,14 @@ static int wait_for_ct_request_update(struct ct_request 
*req, u32 *status)
/*
 * Fast commands should complete in less than 10us, so sample quickly
 * up to that length of time, then switch to a slower sleep-wait loop.
-* No GuC command should ever take longer than 10ms.
+* No GuC command should ever take longer than 10ms but many GuC
+* commands can be inflight at time, so use a 1s timeout on the slower
+* sleep-wait loop.
 */
 #define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status))
err = wait_for_us(done, 10);
if (err)
-   err = wait_for(done, 10);
+   err = wait_for(done, 1000);
 #undef done
 
if (unlikely(err))
-- 
2.28.0



[PATCH v1 2/3] drm/virtio: Prepare resource_flush to accept a fence

2021-06-10 Thread Vivek Kasireddy
A fence will be added to resource_flush for resources that
are guest blobs.

Cc: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h | 4 +++-
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 7 +--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index d9dbc4f258f3..d4e610a44e12 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -315,7 +315,9 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct 
virtio_gpu_device *vgdev,
 void virtio_gpu_cmd_resource_flush(struct virtio_gpu_device *vgdev,
   uint32_t resource_id,
   uint32_t x, uint32_t y,
-  uint32_t width, uint32_t height);
+  uint32_t width, uint32_t height,
+  struct virtio_gpu_object_array *objs,
+  struct virtio_gpu_fence *fence);
 void virtio_gpu_cmd_set_scanout(struct virtio_gpu_device *vgdev,
uint32_t scanout_id, uint32_t resource_id,
uint32_t width, uint32_t height,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index cf84d382dd41..2e71e91278b4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -576,13 +576,16 @@ void virtio_gpu_cmd_set_scanout(struct virtio_gpu_device 
*vgdev,
 void virtio_gpu_cmd_resource_flush(struct virtio_gpu_device *vgdev,
   uint32_t resource_id,
   uint32_t x, uint32_t y,
-  uint32_t width, uint32_t height)
+  uint32_t width, uint32_t height,
+  struct virtio_gpu_object_array *objs,
+  struct virtio_gpu_fence *fence)
 {
struct virtio_gpu_resource_flush *cmd_p;
struct virtio_gpu_vbuffer *vbuf;
 
cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
memset(cmd_p, 0, sizeof(*cmd_p));
+   vbuf->objs = objs;
 
cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_FLUSH);
cmd_p->resource_id = cpu_to_le32(resource_id);
@@ -591,7 +594,7 @@ void virtio_gpu_cmd_resource_flush(struct virtio_gpu_device 
*vgdev,
cmd_p->r.x = cpu_to_le32(x);
cmd_p->r.y = cpu_to_le32(y);
 
-   virtio_gpu_queue_ctrl_buffer(vgdev, vbuf);
+   virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, fence);
 }
 
 void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
-- 
2.30.2



[PATCH v1 3/3] drm/virtio: Add the fence in resource_flush if present

2021-06-10 Thread Vivek Kasireddy
If the framebuffer associated with the plane contains a fence, then
it is added to resource_flush and will be waited upon for a max of
50 msecs or until it is signalled by the Host.

Cc: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 drivers/gpu/drm/virtio/virtgpu_plane.c | 45 ++
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index dd7a1f2db9ad..a49fd9480381 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -129,6 +129,40 @@ static void virtio_gpu_update_dumb_bo(struct 
virtio_gpu_device *vgdev,
   objs, NULL);
 }
 
+static void virtio_gpu_resource_flush(struct drm_plane *plane,
+ uint32_t x, uint32_t y,
+ uint32_t width, uint32_t height)
+{
+   struct drm_device *dev = plane->dev;
+   struct virtio_gpu_device *vgdev = dev->dev_private;
+   struct virtio_gpu_framebuffer *vgfb;
+   struct virtio_gpu_object *bo;
+
+   vgfb = to_virtio_gpu_framebuffer(plane->state->fb);
+   bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
+   if (vgfb->fence) {
+   struct virtio_gpu_object_array *objs;
+
+   objs = virtio_gpu_array_alloc(1);
+   if (!objs)
+   return;
+   virtio_gpu_array_add_obj(objs, vgfb->base.obj[0]);
+   virtio_gpu_array_lock_resv(objs);
+   virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y,
+ width, height, objs, vgfb->fence);
+   virtio_gpu_notify(vgdev);
+
+   dma_fence_wait_timeout(&vgfb->fence->f, true,
+  msecs_to_jiffies(50));
+   dma_fence_put(&vgfb->fence->f);
+   vgfb->fence = NULL;
+   } else {
+   virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle, x, y,
+ width, height, NULL, NULL);
+   virtio_gpu_notify(vgdev);
+   }
+}
+
 static void virtio_gpu_primary_plane_update(struct drm_plane *plane,
struct drm_atomic_state *state)
 {
@@ -198,12 +232,11 @@ static void virtio_gpu_primary_plane_update(struct 
drm_plane *plane,
}
}
 
-   virtio_gpu_cmd_resource_flush(vgdev, bo->hw_res_handle,
- rect.x1,
- rect.y1,
- rect.x2 - rect.x1,
- rect.y2 - rect.y1);
-   virtio_gpu_notify(vgdev);
+   virtio_gpu_resource_flush(plane,
+ rect.x1,
+ rect.y1,
+ rect.x2 - rect.x1,
+ rect.y2 - rect.y1);
 }
 
 static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
-- 
2.30.2



[PATCH v1 1/3] drm/virtio: Add fences for Guest blobs

2021-06-10 Thread Vivek Kasireddy
Add prepare and cleanup routines for primary planes as well
where a fence is added only if the BO/FB associated with the
plane is a guest blob.

Cc: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 drivers/gpu/drm/virtio/virtgpu_plane.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 4e1b17548007..dd7a1f2db9ad 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -206,8 +206,8 @@ static void virtio_gpu_primary_plane_update(struct 
drm_plane *plane,
virtio_gpu_notify(vgdev);
 }
 
-static int virtio_gpu_cursor_prepare_fb(struct drm_plane *plane,
-   struct drm_plane_state *new_state)
+static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
+  struct drm_plane_state *new_state)
 {
struct drm_device *dev = plane->dev;
struct virtio_gpu_device *vgdev = dev->dev_private;
@@ -219,7 +219,10 @@ static int virtio_gpu_cursor_prepare_fb(struct drm_plane 
*plane,
 
vgfb = to_virtio_gpu_framebuffer(new_state->fb);
bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
-   if (bo && bo->dumb && (plane->state->fb != new_state->fb)) {
+   if (!bo || (plane->type == DRM_PLANE_TYPE_PRIMARY && !bo->guest_blob))
+   return 0;
+
+   if (bo->dumb && (plane->state->fb != new_state->fb)) {
vgfb->fence = virtio_gpu_fence_alloc(vgdev);
if (!vgfb->fence)
return -ENOMEM;
@@ -228,8 +231,8 @@ static int virtio_gpu_cursor_prepare_fb(struct drm_plane 
*plane,
return 0;
 }
 
-static void virtio_gpu_cursor_cleanup_fb(struct drm_plane *plane,
-struct drm_plane_state *old_state)
+static void virtio_gpu_plane_cleanup_fb(struct drm_plane *plane,
+   struct drm_plane_state *old_state)
 {
struct virtio_gpu_framebuffer *vgfb;
 
@@ -321,13 +324,15 @@ static void virtio_gpu_cursor_plane_update(struct 
drm_plane *plane,
 }
 
 static const struct drm_plane_helper_funcs virtio_gpu_primary_helper_funcs = {
+   .prepare_fb = virtio_gpu_plane_prepare_fb,
+   .cleanup_fb = virtio_gpu_plane_cleanup_fb,
.atomic_check   = virtio_gpu_plane_atomic_check,
.atomic_update  = virtio_gpu_primary_plane_update,
 };
 
 static const struct drm_plane_helper_funcs virtio_gpu_cursor_helper_funcs = {
-   .prepare_fb = virtio_gpu_cursor_prepare_fb,
-   .cleanup_fb = virtio_gpu_cursor_cleanup_fb,
+   .prepare_fb = virtio_gpu_plane_prepare_fb,
+   .cleanup_fb = virtio_gpu_plane_cleanup_fb,
.atomic_check   = virtio_gpu_plane_atomic_check,
.atomic_update  = virtio_gpu_cursor_plane_update,
 };
-- 
2.30.2



[PATCH v1 0/3] drm/virtio: Add a default synchronization mechanism for blobs

2021-06-10 Thread Vivek Kasireddy
This 3 patch series is the counterpart for this other series:
https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg02906.html

It makes it possible for the Guest to wait until the Host has 
completely consumed its FB before reusing it again thereby ensuring
that both the parties don't access it at the same time.

Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Tina Zhang 

Vivek Kasireddy (3):
  drm/virtio: Add fences for Guest blobs
  drm/virtio: Prepare resource_flush to accept a fence
  drm/virtio: Add the fence in resource_flush if present

 drivers/gpu/drm/virtio/virtgpu_drv.h   |  4 +-
 drivers/gpu/drm/virtio/virtgpu_plane.c | 64 --
 drivers/gpu/drm/virtio/virtgpu_vq.c|  7 ++-
 3 files changed, 59 insertions(+), 16 deletions(-)

-- 
2.30.2



Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Alistair Popple
On Friday, 11 June 2021 9:04:19 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Jun 11, 2021 at 12:21:26AM +1000, Alistair Popple wrote:
> > > Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to 
> > > explicit
> > > call split_huge_pmd_address(), afaict.  Since both of them use 
> > > __split_huge_pmd()
> > > internally which will generate that unwanted CLEAR notify.
> >
> > Agree that gup calls __split_huge_pmd() via split_huge_pmd_address()
> > which will always CLEAR. However gup only calls split_huge_pmd_address() if 
> > it
> > finds a thp pmd. In follow_pmd_mask() we have:
> >
> >   if (likely(!pmd_trans_huge(pmdval)))
> >   return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
> >
> > So I don't think we have a problem here.
> 
> Sorry I didn't follow here..  We do FOLL_SPLIT_PMD after this check, right?  I
> mean, if it's a thp for the current mm, afaict pmd_trans_huge() should return
> true above, so we'll skip follow_page_pte(); then we'll check FOLL_SPLIT_PMD
> and do the split, then the CLEAR notify.  Hmm.. Did I miss something?

That seems correct - if the thp is not mapped with a pmd we won't split and we
won't CLEAR. If there is a thp pmd we will split and CLEAR, but in that case it
is fine - we will retry, but the retry will won't CLEAR because the pmd has
already been split.

The issue arises with doing it unconditionally in make device exclusive is that
you *always* CLEAR even if there is no thp pmd to split. Or at least that's my
understanding, please let me know if it doesn't make sense.

 - Alistair

> --
> Peter Xu
> 






Re: [PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Doug Anderson
Hi,

On Thu, Jun 10, 2021 at 4:01 PM Linus Walleij  wrote:
>
> On Fri, Jun 11, 2021 at 12:42 AM Doug Anderson  wrote:
> > On Thu, Jun 10, 2021 at 3:39 PM Linus Walleij  
> > wrote:
>
>
> > > #define mipi_dbi_command(dbi, cmd, seq...) \
> > > ({ \
> > > const u8 d[] = { seq }; \
> > > mipi_dbi_command_stackbuf(dbi, cmd, d, ARRAY_SIZE(d)); \
> > > })
> > >
> > > I'll fix up the include and apply then we can think about
> > > what to do with mipi_dbi_command().
> >
> > Are you sure that doesn't work? Isn't the return value of a macro the
> > last expression? In this case the return value of
> > mipi_dbi_command_stackbuf() should just flow through.
>
> w00t I didn't know that.
>
> And I like to think of the macro processor as essentially just
> inserting the content of the macro at the cursor.
>
> But arguably it *should* rather be fixed in this macro though?
> It is used in the same way in all other drivers as well.

You want the mipi_dbi_command() to do the error checking and print the
message? Two downsides:

1. What if someone didn't _want_ the message printed? They might want
to try to handle things more elegantly, like maybe fail the function?

2. Currently the mipi_dbi_command() macro doesn't have access to a
"dev" pointer so it wouldn't be able to print as nice of an error as
you can.

That being said, I wouldn't object to introducing a macro next to
mipi_dbi_command() that also takes a dev pointer, prints an error, and
doesn't return a value.

-Doug


Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Peter Xu
On Fri, Jun 11, 2021 at 12:21:26AM +1000, Alistair Popple wrote:
> > Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to 
> > explicit
> > call split_huge_pmd_address(), afaict.  Since both of them use 
> > __split_huge_pmd()
> > internally which will generate that unwanted CLEAR notify.
> 
> Agree that gup calls __split_huge_pmd() via split_huge_pmd_address()
> which will always CLEAR. However gup only calls split_huge_pmd_address() if it
> finds a thp pmd. In follow_pmd_mask() we have:
> 
>   if (likely(!pmd_trans_huge(pmdval)))
>   return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
> 
> So I don't think we have a problem here.

Sorry I didn't follow here..  We do FOLL_SPLIT_PMD after this check, right?  I
mean, if it's a thp for the current mm, afaict pmd_trans_huge() should return
true above, so we'll skip follow_page_pte(); then we'll check FOLL_SPLIT_PMD
and do the split, then the CLEAR notify.  Hmm.. Did I miss something?

-- 
Peter Xu



Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-06-10 Thread Michal Wajdeczko



On 10.06.2021 22:46, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Various UMDs need to know the L3 bank count. So add a query API for it.
> 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
>  drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
>  drivers/gpu/drm/i915/i915_query.c  | 22 ++
>  drivers/gpu/drm/i915/i915_reg.h|  1 +
>  include/uapi/drm/i915_drm.h|  1 +
>  5 files changed, 40 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 2161bf01ef8b..708bb3581d83 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info 
> *info,
>  
>   intel_sseu_dump(&info->sseu, p);
>  }
> +
> +int intel_gt_get_l3bank_count(struct intel_gt *gt)
> +{
> + struct drm_i915_private *i915 = gt->i915;
> + intel_wakeref_t wakeref;
> + u32 fuse3;
> +
> + if (GRAPHICS_VER(i915) < 12)
> + return -ENODEV;
> +
> + with_intel_runtime_pm(gt->uncore->rpm, wakeref)
> + fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
> +
> + return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
> b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 7ec395cace69..46aa1cf4cf30 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
> *gt)
>  
>  void intel_gt_info_print(const struct intel_gt_info *info,
>struct drm_printer *p);
> +int intel_gt_get_l3bank_count(struct intel_gt *gt);
>  
>  void intel_gt_watchdog_work(struct work_struct *work);
>  
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index 96bd8fb3e895..0e92bb2d21b2 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -10,6 +10,7 @@
>  #include "i915_perf.h"
>  #include "i915_query.h"
>  #include 
> +#include "gt/intel_gt.h"
>  
>  static int copy_query_item(void *query_hdr, size_t query_sz,
>  u32 total_length,
> @@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
> *i915,
>   return hwconfig->size;
>  }
>  
> +static int query_l3banks(struct drm_i915_private *i915,
> +  struct drm_i915_query_item *query_item)
> +{
> + u32 banks;

likely we also need to check:

if (query_item->flags != 0)
return -EINVAL;

> +
> + if (query_item->length == 0)
> + return sizeof(banks);
> +
> + if (query_item->length < sizeof(banks))
> + return -EINVAL;
> +
> + banks = intel_gt_get_l3bank_count(&i915->gt);
> +
> + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
> +  &banks, sizeof(banks)))
> + return -EFAULT;
> +
> + return sizeof(banks);
> +}
> +
>  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   struct drm_i915_query_item *query_item) 
> = {
>   query_topology_info,
> @@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
> drm_i915_private *dev_priv,
>   query_perf_config,
>   query_memregion_info,
>   query_hwconfig_table,
> + query_l3banks,
>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *file)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index eb13c601d680..e9ba88fe3db7 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  #define  GEN10_MIRROR_FUSE3  _MMIO(0x9118)
>  #define GEN10_L3BANK_PAIR_COUNT 4
>  #define GEN10_L3BANK_MASK   0x0F
> +#define GEN12_GT_L3_MODE_MASK 0xFF
>  
>  #define GEN8_EU_DISABLE0 _MMIO(0x9134)
>  #define   GEN8_EU_DIS0_S0_MASK   0xff
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 87d369cae22a..20d18cca5066 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_PERF_CONFIG  3
>  #define DRM_I915_QUERY_MEMORY_REGIONS   4
>  #define DRM_I915_QUERY_HWCONFIG_TABLE   5
> +#define DRM_I915_QUERY_L3_BANK_COUNT6
>  /* Must be kept compact -- no holes and well documented */
>  
>   /**
> 


Re: [PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Linus Walleij
On Fri, Jun 11, 2021 at 12:42 AM Doug Anderson  wrote:
> On Thu, Jun 10, 2021 at 3:39 PM Linus Walleij  
> wrote:


> > #define mipi_dbi_command(dbi, cmd, seq...) \
> > ({ \
> > const u8 d[] = { seq }; \
> > mipi_dbi_command_stackbuf(dbi, cmd, d, ARRAY_SIZE(d)); \
> > })
> >
> > I'll fix up the include and apply then we can think about
> > what to do with mipi_dbi_command().
>
> Are you sure that doesn't work? Isn't the return value of a macro the
> last expression? In this case the return value of
> mipi_dbi_command_stackbuf() should just flow through.

w00t I didn't know that.

And I like to think of the macro processor as essentially just
inserting the content of the macro at the cursor.

But arguably it *should* rather be fixed in this macro though?
It is used in the same way in all other drivers as well.

Yours,
Linus Walleij


Re: [PATCH v4 13/17] drm/i915/pxp: Enable PXP power management

2021-06-10 Thread Daniele Ceraolo Spurio




On 6/2/2021 9:20 AM, Rodrigo Vivi wrote:

On Mon, May 24, 2021 at 10:47:59PM -0700, Daniele Ceraolo Spurio wrote:

From: "Huang, Sean Z" 

During the power event S3+ sleep/resume, hardware will lose all the
encryption keys for every hardware session, even though the
session state might still be marked as alive after resume. Therefore,
we should consider the session as dead on suspend and invalidate all the
objects. The session will be automatically restarted on the first
protected submission on resume.

v2: runtime suspend also invalidates the keys
v3: fix return codes, simplify rpm ops (Chris), use the new worker func
v4: invalidate the objects on suspend, don't re-create the arb sesson on
resume (delayed to first submission).

Signed-off-by: Huang, Sean Z 
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Chris Wilson 
Cc: Rodrigo Vivi 
---
  drivers/gpu/drm/i915/Makefile|  1 +
  drivers/gpu/drm/i915/gt/intel_gt_pm.c| 15 +++-
  drivers/gpu/drm/i915/i915_drv.c  |  2 +
  drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 11 --
  drivers/gpu/drm/i915/pxp/intel_pxp_pm.c  | 40 
  drivers/gpu/drm/i915/pxp/intel_pxp_pm.h  | 23 +++
  drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 38 ++-
  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |  9 +
  8 files changed, 124 insertions(+), 15 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_pm.c
  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_pm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 29331bbb3e98..9cce0bf9a50f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -278,6 +278,7 @@ i915-$(CONFIG_DRM_I915_PXP) += \
pxp/intel_pxp.o \
pxp/intel_pxp_cmd.o \
pxp/intel_pxp_irq.o \
+   pxp/intel_pxp_pm.o \
pxp/intel_pxp_session.o \
pxp/intel_pxp_tee.o
  
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c

index aef3084e8b16..91151a02f7a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -19,6 +19,7 @@
  #include "intel_rc6.h"
  #include "intel_rps.h"
  #include "intel_wakeref.h"
+#include "pxp/intel_pxp_pm.h"
  
  static void user_forcewake(struct intel_gt *gt, bool suspend)

  {
@@ -265,6 +266,8 @@ int intel_gt_resume(struct intel_gt *gt)
  
  	intel_uc_resume(>->uc);
  
+	intel_pxp_resume(>->pxp);

+
user_forcewake(gt, false);
  
  out_fw:

@@ -299,6 +302,7 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
user_forcewake(gt, true);
wait_for_suspend(gt);
  
+	intel_pxp_suspend(>->pxp);

intel_uc_suspend(>->uc);
  }
  
@@ -349,6 +353,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
  
  void intel_gt_runtime_suspend(struct intel_gt *gt)

  {
+   intel_pxp_suspend(>->pxp);
intel_uc_runtime_suspend(>->uc);
  
  	GT_TRACE(gt, "\n");

@@ -356,11 +361,19 @@ void intel_gt_runtime_suspend(struct intel_gt *gt)
  
  int intel_gt_runtime_resume(struct intel_gt *gt)

  {
+   int ret;
+
GT_TRACE(gt, "\n");
intel_gt_init_swizzling(gt);
intel_ggtt_restore_fences(gt->ggtt);
  
-	return intel_uc_runtime_resume(>->uc);

+   ret = intel_uc_runtime_resume(>->uc);
+   if (ret)
+   return ret;
+
+   intel_pxp_resume(>->pxp);
+
+   return 0;
  }
  
  static ktime_t __intel_gt_get_awake_time(const struct intel_gt *gt)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 2f06bb7b3ed2..6543e5577709 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -68,6 +68,8 @@
  #include "gt/intel_gt_pm.h"
  #include "gt/intel_rc6.h"
  
+#include "pxp/intel_pxp_pm.h"

+
  #include "i915_debugfs.h"
  #include "i915_drv.h"
  #include "i915_ioc32.h"
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index a230d0034e50..9e5847c653f2 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -9,6 +9,7 @@
  #include "gt/intel_gt_irq.h"
  #include "i915_irq.h"
  #include "i915_reg.h"
+#include "intel_runtime_pm.h"
  
  /**

   * intel_pxp_irq_handler - Handles PXP interrupts.
@@ -62,11 +63,13 @@ void intel_pxp_irq_enable(struct intel_pxp *pxp)
struct intel_gt *gt = pxp_to_gt(pxp);
  
  	spin_lock_irq(>->irq_lock);

-   if (!pxp->irq_enabled) {
+
+   if (!pxp->irq_enabled)
WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_KCR));
-   __pxp_set_interrupts(gt, GEN12_PXP_INTERRUPTS);
-   pxp->irq_enabled = true;
-   }
+
+   __pxp_set_interrupts(gt, GEN12_PXP_INTERRUPTS);
+   pxp->irq_enabled = true;

why?
and if we really need this maybe worth a squash on the other patch or a 
separated new one?


I had some troubles with the driver resetting all interrupts on S3 exit 
behind the PXP c

Re: [PATCH 2/3] drm/i915/uapi: Add query for hwconfig table

2021-06-10 Thread Michal Wajdeczko



On 10.06.2021 22:46, john.c.harri...@intel.com wrote:
> From: Rodrigo Vivi 
> 
> GuC contains a consolidated table with a bunch of information about the
> current device.
> 
> Previously, this information was spread and hardcoded to all the components
> including GuC, i915 and various UMDs. The goal here is to consolidate
> the data into GuC in a way that all interested components can grab the
> very latest and synchronized information using a simple query.
> 
> As per most of the other queries, this one can be called twice.
> Once with item.length=0 to determine the exact buffer size, then
> allocate the user memory and call it again for to retrieve the
> table data. For example:
>   struct drm_i915_query_item item = {
> .query_id = DRM_I915_QUERY_HWCONCFIG_TABLE;
>   };
>   query.items_ptr = (int64_t) &item;
>   query.num_items = 1;
> 
>   ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));
> 
>   if (item.length <= 0)
> return -ENOENT;
> 
>   data = malloc(item.length);
>   item.data_ptr = (int64_t) &data;
>   ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));
> 
>   // Parse the data as appropriate...
> 
> The returned array is a simple and flexible KLV (Key/Length/Value)
> formatted table. For example, it could be just:
>   enum device_attr {
>  ATTR_SOME_VALUE = 0,
>  ATTR_SOME_MASK  = 1,
>   };
> 
>   static const u32 hwconfig[] = {
>   ATTR_SOME_VALUE,
>   1, // Value Length in DWords
>   8, // Value
> 
>   ATTR_SOME_MASK,
>   3,
>   0x00, 0x, 0xFF00,
>   };

same example was already added to code in previous patch
maybe just refer to that documentation ?

> 
> The attribute ids are defined in a hardware spec. The current list as
> known to the i915 driver can be found in i915/gt/intel_guc_hwconfig_types.h

previous patch introduced i915/gt/intel_hwconfig_types.h

also, i915 seems to be not using any/many of them directly, so it could
happen that GuC will return new/updated klvs, so shouldn't we make this
klv list more external and maybe even define as uabi header?

> 
> Cc: Tvrtko Ursulin 
> Cc: Kenneth Graunke 
> Cc: Michal Wajdeczko 
> Cc: Slawomir Milczarek 
> Signed-off-by: Rodrigo Vivi 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/i915_query.c | 23 +++
>  include/uapi/drm/i915_drm.h   |  1 +
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index e49da36c62fb..96bd8fb3e895 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -480,12 +480,35 @@ static int query_memregion_info(struct drm_i915_private 
> *i915,
>   return total_length;
>  }
>  
> +static int query_hwconfig_table(struct drm_i915_private *i915,
> + struct drm_i915_query_item *query_item)
> +{
> + struct intel_gt *gt = &i915->gt;
> + struct intel_guc_hwconfig *hwconfig = >->uc.guc.hwconfig;
> +
> + if (!hwconfig->size || !hwconfig->ptr)
> + return -ENODEV;

shouldn't we also have:

if (query_item->flags != 0)
return -EINVAL;

> +
> + if (query_item->length == 0)
> + return hwconfig->size;
> +
> + if (query_item->length < hwconfig->size)
> + return -EINVAL;
> +
> + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
> +  hwconfig->ptr, hwconfig->size))
> + return -EFAULT;
> +
> + return hwconfig->size;
> +}
> +
>  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   struct drm_i915_query_item *query_item) 
> = {
>   query_topology_info,
>   query_engine_info,
>   query_perf_config,
>   query_memregion_info,
> + query_hwconfig_table,
>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index c2c7759b7d2e..87d369cae22a 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2233,6 +2233,7 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_ENGINE_INFO   2
>  #define DRM_I915_QUERY_PERF_CONFIG  3
>  #define DRM_I915_QUERY_MEMORY_REGIONS   4
> +#define DRM_I915_QUERY_HWCONFIG_TABLE   5

hmm, not sure is this single line satisfies below "well documented"
requirement ;)

>  /* Must be kept compact -- no holes and well documented */
>  
>   /**
> 


Re: [Intel-gfx] [PATCH v4 12/17] drm/i915/pxp: start the arb session on demand

2021-06-10 Thread Daniele Ceraolo Spurio




On 6/2/2021 11:14 AM, Rodrigo Vivi wrote:

On Mon, May 24, 2021 at 10:47:58PM -0700, Daniele Ceraolo Spurio wrote:

Now that we can handle destruction and re-creation of the arb session,
we can postpone the start of the session to the first submission that
requires it, to avoid keeping it running with no user.

Signed-off-by: Daniele Ceraolo Spurio 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  8 ++--
  drivers/gpu/drm/i915/pxp/intel_pxp.c  | 37 ---
  drivers/gpu/drm/i915/pxp/intel_pxp.h  |  4 +-
  drivers/gpu/drm/i915/pxp/intel_pxp_irq.c  |  2 +-
  drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |  6 +--
  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c  | 10 +
  drivers/gpu/drm/i915/pxp/intel_pxp_types.h|  3 ++
  7 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index a11e9d5767bf..c08e28847064 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2948,9 +2948,11 @@ eb_select_engine(struct i915_execbuffer *eb)
intel_gt_pm_get(ce->engine->gt);
  
  	if (i915_gem_context_uses_protected_content(eb->gem_context)) {

-   err = intel_pxp_wait_for_arb_start(&ce->engine->gt->pxp);
-   if (err)
-   goto err;
+   if (!intel_pxp_is_active(&ce->engine->gt->pxp)) {
+   err = intel_pxp_start(&ce->engine->gt->pxp);
+   if (err)
+   goto err;
+   }
  
  		if (i915_gem_context_invalidated(eb->gem_context)) {

err = -EACCES;
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index f713d3423cea..2291c68fd3a0 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -77,6 +77,7 @@ void intel_pxp_init(struct intel_pxp *pxp)
init_completion(&pxp->termination);
complete_all(&pxp->termination);
  
+	mutex_init(&pxp->arb_mutex);

INIT_WORK(&pxp->session_work, intel_pxp_session_work);
  
  	ret = create_vcs_context(pxp);

@@ -113,7 +114,7 @@ void intel_pxp_mark_termination_in_progress(struct 
intel_pxp *pxp)
reinit_completion(&pxp->termination);
  }
  
-static void intel_pxp_queue_termination(struct intel_pxp *pxp)

+static void pxp_queue_termination(struct intel_pxp *pxp)
  {
struct intel_gt *gt = pxp_to_gt(pxp);
  
@@ -132,31 +133,41 @@ static void intel_pxp_queue_termination(struct intel_pxp *pxp)

   * the arb session is restarted from the irq work when we receive the
   * termination completion interrupt
   */
-int intel_pxp_wait_for_arb_start(struct intel_pxp *pxp)
+int intel_pxp_start(struct intel_pxp *pxp)
  {
+   int ret = 0;
+
if (!intel_pxp_is_enabled(pxp))
-   return 0;
+   return -ENODEV;
+
+   mutex_lock(&pxp->arb_mutex);
+
+   if (pxp->arb_is_valid)
+   goto unlock;
+
+   pxp_queue_termination(pxp);
  
  	if (!wait_for_completion_timeout(&pxp->termination,

-msecs_to_jiffies(100)))
-   return -ETIMEDOUT;
+   msecs_to_jiffies(100))) {
+   ret = -ETIMEDOUT;
+   goto unlock;
+   }
+
+   /* make sure the compiler doesn't optimize the double access */
+   barrier();
  
  	if (!pxp->arb_is_valid)

-   return -EIO;
+   ret = -EIO;
  
-	return 0;

+unlock:
+   mutex_unlock(&pxp->arb_mutex);
+   return ret;
  }
  
  void intel_pxp_init_hw(struct intel_pxp *pxp)

  {
kcr_pxp_enable(pxp_to_gt(pxp));
intel_pxp_irq_enable(pxp);
-
-   /*
-* the session could've been attacked while we weren't loaded, so
-* handle it as if it was and re-create it.
-*/
-   intel_pxp_queue_termination(pxp);
  }
  
  void intel_pxp_fini_hw(struct intel_pxp *pxp)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp.h
index 91c1a2056309..1f9871e64096 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.h
@@ -32,7 +32,7 @@ void intel_pxp_init_hw(struct intel_pxp *pxp);
  void intel_pxp_fini_hw(struct intel_pxp *pxp);
  
  void intel_pxp_mark_termination_in_progress(struct intel_pxp *pxp);

-int intel_pxp_wait_for_arb_start(struct intel_pxp *pxp);
+int intel_pxp_start(struct intel_pxp *pxp);
  void intel_pxp_invalidate(struct intel_pxp *pxp);
  #else
  static inline void intel_pxp_init(struct intel_pxp *pxp)
@@ -43,7 +43,7 @@ static inline void intel_pxp_fini(struct intel_pxp *pxp)
  {
  }
  
-static inline int intel_pxp_wait_for_arb_start(struct intel_pxp *pxp)

+static inline int intel_pxp_start(struct intel_pxp *pxp)
  {
return 0;
  }
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i91

Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Alistair Popple
On Friday, 11 June 2021 4:04:35 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Jun 10, 2021 at 10:18:25AM +1000, Alistair Popple wrote:
> > > > The main problem is split_huge_pmd_address() unconditionally calls a mmu
> > > > notifier so I would need to plumb in passing an owner everywhere which 
> > > > could
> > > > get messy.
> > >
> > > Could I ask why?  split_huge_pmd_address() will notify with CLEAR, so I'm 
> > > a bit
> > > confused why we need to pass over the owner.
> >
> > Sure, it is the same reason we need to pass it for the exclusive notifier.
> > Any invalidation during the make exclusive operation will break the mmu read
> > side critical section forcing a retry of the operation. The owner field is 
> > what
> > is used to filter out invalidations (such as the exclusive invalidation) 
> > that
> > don't need to be retried.
> 
> Do you mean the mmu_interval_read_begin|retry() calls?

Yep.

> Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to 
> explicit
> call split_huge_pmd_address(), afaict.  Since both of them use 
> __split_huge_pmd()
> internally which will generate that unwanted CLEAR notify.

Agree that gup calls __split_huge_pmd() via split_huge_pmd_address()
which will always CLEAR. However gup only calls split_huge_pmd_address() if it
finds a thp pmd. In follow_pmd_mask() we have:

if (likely(!pmd_trans_huge(pmdval)))
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);

So I don't think we have a problem here.

> If that's the case, I think it fails because split_huge_pmd_address() will
> trigger that CLEAR notify unconditionally (even if it's not a thp; not sure
> whether it should be optimized to not notify at all... definitely another
> story), while FOLL_SPLIT_PMD will skip the notify as it calls split_huge_pmd()
> instead, who checks the pmd before calling __split_huge_pmd().
> 
> Does it also mean that if there's a real THP it won't really work?  As then
> FOLL_SPLIT_PMD will start to trigger that CLEAR notify too, I think..
> 
> --
> Peter Xu
> 






Re: [PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Doug Anderson
Hi,

On Thu, Jun 10, 2021 at 3:39 PM Linus Walleij  wrote:
>
> On Fri, Jun 11, 2021 at 12:30 AM Doug Anderson  wrote:
>
> > > +   mipi_dbi_command(dbi, MIPI_DCS_SET_ADDRESS_MODE, 0x0a);
> >
> > I would still prefer it if there was some type of error checking since
> > SPI commands can fail and could potentially fail silently. What about
> > at least this (untested):
> >
> > #define db7430_dbi_cmd(_db, _cmd, _seq...) \
> >   do {
> > int _ret = mipi_dbi_command(_db->dbi, _cmd, _seq);
> > if (_ret)
> >   dev_warn(_db->dev, "DBI cmd %d failed (%d)\n", _cmd, _ret);
> >   } while (0)
> >
> > Then at least you know _something_ will show up in the logs if there's
> > a transfer failure instead of silence?
> >
> > If you truly don't want the error checking then I guess I won't
> > insist, but it feels like the kind of thing that will bite someone
> > eventually... In any case, I'm happy to add this now (especially since
> > the DBI stuff is Acked now).
>
> This looks more like something that should be done in
> mipi_dbi_command() in include/drm/drm_mipi_dbi.h
> which claims:
>
>  * Returns:
>  * Zero on success, negative error code on failure.
>  */
>
> But no it does not return anything:
>
> #define mipi_dbi_command(dbi, cmd, seq...) \
> ({ \
> const u8 d[] = { seq }; \
> mipi_dbi_command_stackbuf(dbi, cmd, d, ARRAY_SIZE(d)); \
> })
>
> I'll fix up the include and apply then we can think about
> what to do with mipi_dbi_command().

Are you sure that doesn't work? Isn't the return value of a macro the
last expression? In this case the return value of
mipi_dbi_command_stackbuf() should just flow through.

-Doug


Re: [PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Linus Walleij
On Fri, Jun 11, 2021 at 12:30 AM Doug Anderson  wrote:

> > +   mipi_dbi_command(dbi, MIPI_DCS_SET_ADDRESS_MODE, 0x0a);
>
> I would still prefer it if there was some type of error checking since
> SPI commands can fail and could potentially fail silently. What about
> at least this (untested):
>
> #define db7430_dbi_cmd(_db, _cmd, _seq...) \
>   do {
> int _ret = mipi_dbi_command(_db->dbi, _cmd, _seq);
> if (_ret)
>   dev_warn(_db->dev, "DBI cmd %d failed (%d)\n", _cmd, _ret);
>   } while (0)
>
> Then at least you know _something_ will show up in the logs if there's
> a transfer failure instead of silence?
>
> If you truly don't want the error checking then I guess I won't
> insist, but it feels like the kind of thing that will bite someone
> eventually... In any case, I'm happy to add this now (especially since
> the DBI stuff is Acked now).

This looks more like something that should be done in
mipi_dbi_command() in include/drm/drm_mipi_dbi.h
which claims:

 * Returns:
 * Zero on success, negative error code on failure.
 */

But no it does not return anything:

#define mipi_dbi_command(dbi, cmd, seq...) \
({ \
const u8 d[] = { seq }; \
mipi_dbi_command_stackbuf(dbi, cmd, d, ARRAY_SIZE(d)); \
})

I'll fix up the include and apply then we can think about
what to do with mipi_dbi_command().

Yours,
Linus Walleij


Re: [PATCH 1/3] drm/i915/guc: Add fetch of hwconfig table

2021-06-10 Thread Michal Wajdeczko



On 10.06.2021 22:46, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Implement support for fetching the hardware description table from the
> GuC. The call is made twice - once without a destination buffer to
> query the size and then a second time to fill in the buffer.
> 
> This patch also adds a header file which lists all the attribute values
> currently defined for the table. This is included for reference as
> these are not currently used by the i915 driver itself.
> 
> Note that the table is only available on ADL-P and later platforms.
> 
> Cc: Michal Wajdeczko 
> Signed-off-by: Rodrigo Vivi 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  .../gpu/drm/i915/gt/intel_hwconfig_types.h| 102 +++
>  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
>  .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c|   3 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
>  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 167 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 ++
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c |   6 +
>  9 files changed, 304 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 2adb6b420c7c..8e957ca7c9f1 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -187,6 +187,7 @@ i915-y += gt/uc/intel_uc.o \
> gt/uc/intel_guc_log.o \
> gt/uc/intel_guc_log_debugfs.o \
> gt/uc/intel_guc_submission.o \
> +   gt/uc/intel_guc_hwconfig.o \
> gt/uc/intel_huc.o \
> gt/uc/intel_huc_debugfs.o \
> gt/uc/intel_huc_fw.o
> diff --git a/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h 
> b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
> new file mode 100644
> index ..b09c0f65b93a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2020 Intel Corporation

it's already June'21

> + */
> +
> +#ifndef _INTEL_HWCONFIG_TYPES_H_
> +#define _INTEL_HWCONFIG_TYPES_H_
> +
> +/**
> + * enum intel_hwconfig - Global definition of hwconfig table attributes
> + *
> + * Intel devices provide a KLV (Key/Length/Value) table containing
> + * the static hardware configuration for that platform.
> + * This header defines the current attribute keys for this KLV.

s/header/enum

as this is enum documentation

likely separate DOC: section with explanation of the whole HW KLV
concept could be helpful if plugged into i915 rst

> + */
> +enum intel_hwconfig {
> + INTEL_HWCONFIG_MAX_SLICES_SUPPORTED = 1,
> + INTEL_HWCONFIG_MAX_DUAL_SUBSLICES_SUPPORTED,/* 2 */

putting estimated enum values in comments could be misleading, as if
someone accidentally add some enum in the middle then all values in
comments will be stale

if you really want stable definitions, without risking accidental
breakage, then better to define enums with explicit value, like you did
for the first one:

INTEL_HWCONFIG_MAX_SLICES_SUPPORTED = 1,
INTEL_HWCONFIG_MAX_DUAL_SUBSLICES_SUPPORTED = 2,
INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS = 3,
...

> + INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS,  /* 3 */
> + INTEL_HWCONFIG_NUM_PIXEL_PIPES, /* 4 */
> + INTEL_HWCONFIG_DEPRECATED_MAX_NUM_GEOMETRY_PIPES,   /* 5 */
> + INTEL_HWCONFIG_DEPRECATED_L3_CACHE_SIZE_IN_KB,  /* 6 */
> + INTEL_HWCONFIG_DEPRECATED_L3_BANK_COUNT,/* 7 */

what's the meaning of the 'deprecated' here ?

if not used in ADLP and beyond, then I guess no reason to define them.
just skip these numbers:

INTEL_HWCONFIG_NUM_PIXEL_PIPES = 4,
/* 5-7 not used/reserved/deprecated */
INTEL_HWCONFIG_L3_CACHE_WAYS_SIZE_IN_BYTES = 8,

> + INTEL_HWCONFIG_L3_CACHE_WAYS_SIZE_IN_BYTES, /* 8 */
> + INTEL_HWCONFIG_L3_CACHE_WAYS_PER_SECTOR,/* 9 */
> + INTEL_HWCONFIG_MAX_MEMORY_CHANNELS, /* 10 */
> + INTEL_HWCONFIG_MEMORY_TYPE, /* 11 */
> + INTEL_HWCONFIG_CACHE_TYPES, /* 12 */
> + INTEL_HWCONFIG_LOCAL_MEMORY_PAGE_SIZES_SUPPORTED,   /* 13 */
> + INTEL_HWCONFIG_DEPRECATED_SLM_SIZE_IN_KB,   /* 14 */
> + INTEL_HWCONFIG_NUM_THREADS_PER_EU,  /* 15 */
> + INTEL_HWCONFIG_TOTAL_VS_THREADS,/* 16 */
> + INTEL_HWCONFIG_TOTAL_GS_THREADS,/* 17 */
> + INTEL_HWCONFIG_TOTAL_HS_THREADS,/* 18 */
> + INTEL_HWCONFIG_TOTAL_DS

Re: [PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Doug Anderson
Hi,

On Thu, Jun 10, 2021 at 3:07 PM Linus Walleij  wrote:
>
> @@ -0,0 +1,347 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Panel driver for the Samsung LMS397KF04 480x800 DPI RGB panel.
> + * According to the data sheet the display controller is called DB7430.
> + * Found in the Samsung Galaxy Beam GT-I8350 mobile phone.
> + * Linus Walleij 
> + */
> +#include 
> +#include 

nit that "mipi" sorts before "modes"


> +static int db7430_power_on(struct db7430 *db)
> +{
> +   struct mipi_dbi *dbi = &db->dbi;
> +   int ret;
> +
> +   /* Power up */
> +   ret = regulator_bulk_enable(ARRAY_SIZE(db->regulators),
> +   db->regulators);
> +   if (ret) {
> +   dev_err(db->dev, "failed to enable regulators: %d\n", ret);
> +   return ret;
> +   }
> +   msleep(50);
> +
> +   /* Assert reset >=1 ms */
> +   gpiod_set_value_cansleep(db->reset, 1);
> +   usleep_range(1000, 5000);
> +   /* De-assert reset */
> +   gpiod_set_value_cansleep(db->reset, 0);
> +   /* Wait >= 10 ms */
> +   msleep(10);
> +   dev_dbg(db->dev, "de-asserted RESET\n");
> +
> +   /*
> +* This is set to 0x0a (RGB/BGR order + horizontal flip) in order
> +* to make the display behave normally. If this is not set the 
> displays
> +* normal output behaviour is horizontally flipped and BGR ordered. Do
> +* it twice because the first message doesn't always "take".
> +*/
> +   mipi_dbi_command(dbi, MIPI_DCS_SET_ADDRESS_MODE, 0x0a);

I would still prefer it if there was some type of error checking since
SPI commands can fail and could potentially fail silently. What about
at least this (untested):

#define db7430_dbi_cmd(_db, _cmd, _seq...) \
  do {
int _ret = mipi_dbi_command(_db->dbi, _cmd, _seq);
if (_ret)
  dev_warn(_db->dev, "DBI cmd %d failed (%d)\n", _cmd, _ret);
  } while (0)

Then at least you know _something_ will show up in the logs if there's
a transfer failure instead of silence?

If you truly don't want the error checking then I guess I won't
insist, but it feels like the kind of thing that will bite someone
eventually... In any case, I'm happy to add this now (especially since
the DBI stuff is Acked now).

Reviewed-by: Douglas Anderson 

I presume that you'd commit it to drm-misc yourself?

-Doug


Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-06-10 Thread Matthew Brost
On Thu, Jun 10, 2021 at 01:46:26PM -0700, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Various UMDs need to know the L3 bank count. So add a query API for it.
> 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
>  drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
>  drivers/gpu/drm/i915/i915_query.c  | 22 ++
>  drivers/gpu/drm/i915/i915_reg.h|  1 +
>  include/uapi/drm/i915_drm.h|  1 +
>  5 files changed, 40 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 2161bf01ef8b..708bb3581d83 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info 
> *info,
>  
>   intel_sseu_dump(&info->sseu, p);
>  }
> +
> +int intel_gt_get_l3bank_count(struct intel_gt *gt)

Small nit, this function is ..'l3bank_count' while the define for query
is ..'L3_BANK_COUNT'. I'm thinking this function should have a space
between l3 & bank for consistency.

> +{
> + struct drm_i915_private *i915 = gt->i915;
> + intel_wakeref_t wakeref;
> + u32 fuse3;
> +
> + if (GRAPHICS_VER(i915) < 12)
> + return -ENODEV;
> +
> + with_intel_runtime_pm(gt->uncore->rpm, wakeref)
> + fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
> +
> + return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
> b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 7ec395cace69..46aa1cf4cf30 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
> *gt)
>  
>  void intel_gt_info_print(const struct intel_gt_info *info,
>struct drm_printer *p);
> +int intel_gt_get_l3bank_count(struct intel_gt *gt);
>  
>  void intel_gt_watchdog_work(struct work_struct *work);
>  
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index 96bd8fb3e895..0e92bb2d21b2 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -10,6 +10,7 @@
>  #include "i915_perf.h"
>  #include "i915_query.h"
>  #include 
> +#include "gt/intel_gt.h"
>  
>  static int copy_query_item(void *query_hdr, size_t query_sz,
>  u32 total_length,
> @@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
> *i915,
>   return hwconfig->size;
>  }
>  
> +static int query_l3banks(struct drm_i915_private *i915,
> +  struct drm_i915_query_item *query_item)
> +{
> + u32 banks;
> +
> + if (query_item->length == 0)
> + return sizeof(banks);
> +
> + if (query_item->length < sizeof(banks))
> + return -EINVAL;
> +
> + banks = intel_gt_get_l3bank_count(&i915->gt);
> +
> + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
> +  &banks, sizeof(banks)))
> + return -EFAULT;
> +
> + return sizeof(banks);
> +}
> +
>  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   struct drm_i915_query_item *query_item) 
> = {
>   query_topology_info,
> @@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
> drm_i915_private *dev_priv,
>   query_perf_config,
>   query_memregion_info,
>   query_hwconfig_table,
> + query_l3banks,

Another nit, for consistency query_l3banks -> query_l3_bank_count.

With these nits fixed:
Reviewed-by: Matthew Brost 

>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *file)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index eb13c601d680..e9ba88fe3db7 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  #define  GEN10_MIRROR_FUSE3  _MMIO(0x9118)
>  #define GEN10_L3BANK_PAIR_COUNT 4
>  #define GEN10_L3BANK_MASK   0x0F
> +#define GEN12_GT_L3_MODE_MASK 0xFF
>  
>  #define GEN8_EU_DISABLE0 _MMIO(0x9134)
>  #define   GEN8_EU_DIS0_S0_MASK   0xff
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 87d369cae22a..20d18cca5066 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_PERF_CONFIG  3
>  #define DRM_I915_QUERY_MEMORY_REGIONS   4
>  #define DRM_I915_QUERY_HWCONFIG_TABLE   5
> +#define DRM_I915_QUERY_L3_BANK_COUNT6
>  /* Must be kept compact -- no holes and well documented */
>  
>   /**
> -- 
> 2.25.1
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> htt

[PATCH v5] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Linus Walleij
This adds a new driver for the Samsung DB7430 DPI display
controller as controlled over SPI.

Right now the only panel product we know that is using this
display controller is the LMS397KF04 but there may be more.

This is the first regular panel driver making use of the
MIPI DBI helper library. The DBI "device" portions can not
be used because that code assumes the use of a single
regulator and specific timings around the reset pulse that
do not match the DB7430 datasheet.

Cc: Paul Cercueil 
Cc: Doug Anderson 
Acked-by: Noralf Trønnes 
Signed-off-by: Linus Walleij 
---
ChangeLog v4->v5:
- Drop the SPI hardcoding to 9 BPW, so the DBI emulation
  will be utilized if needed.
- Drop the MODE_3 setting: we can set this up in the DTS
  instead.
- Collect Noralf's ACK.
ChangeLog v3->v4:
- Managed to make use of the in-kernel DBI support to
  conjure and send 9bit DBI SPI messages.
- This cuts out a bit of overhead.
- Deeper integration with the DBI library is not done, as
  assumes too much about the device, such as being used
  as a stand-alone framebuffer (this device is not).
ChangeLog v2->v3:
- Fix some minor comments and formatting.
- Print an error if the DCS sequence write fails.
- Set BPC (bits per color) to 8 on the display_info. Some
  drivers may need this.
ChangeLog v1->v2:
- Rename driver and variables with the db7430_* prefix instead
  of lms397kf04 as there may be more display products out there
  using this display controller. Also change Kconfig symbol.
- Push doc comments together on one line where possible.
- Rename DB7430_MANUFACTURER_CMD to DB7430_ACCESS_PROT_OFF
- Return error from regulator_bulk_disable() down to
  db7430_unprepare() and propagate.
- Use usleep_range(1000, 5000) instead of msleep(1)
- Shorten prepare/unprepare callbacks by more compact code.
- Use devm_err_probe() and provide proper probe errors all
  the way through the probe() function.
---
 MAINTAINERS  |   7 +
 drivers/gpu/drm/panel/Kconfig|  10 +
 drivers/gpu/drm/panel/Makefile   |   1 +
 drivers/gpu/drm/panel/panel-samsung-db7430.c | 347 +++
 4 files changed, 365 insertions(+)
 create mode 100644 drivers/gpu/drm/panel/panel-samsung-db7430.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bd7aff0c120f..6ff4777b1018 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5857,6 +5857,13 @@ S:   Maintained
 F: Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml
 F: drivers/gpu/drm/panel/panel-raydium-rm67191.c
 
+DRM DRIVER FOR SAMSUNG DB7430 PANELS
+M: Linus Walleij 
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: Documentation/devicetree/bindings/display/panel/samsung,lms397kf04.yaml
+F: drivers/gpu/drm/panel/panel-samsung-db7430.c
+
 DRM DRIVER FOR SITRONIX ST7703 PANELS
 M: Guido Günther 
 R: Purism Kernel Team 
diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index 4894913936e9..6d1b90f4f2bb 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -342,6 +342,16 @@ config DRM_PANEL_RONBO_RB070D30
  Say Y here if you want to enable support for Ronbo Electronics
  RB070D30 1024x600 DSI panel.
 
+config DRM_PANEL_SAMSUNG_DB7430
+   tristate "Samsung DB7430-based DPI panels"
+   depends on OF && SPI && GPIOLIB
+   depends on BACKLIGHT_CLASS_DEVICE
+   select DRM_MIPI_DBI
+   help
+ Say Y here if you want to enable support for the Samsung
+ DB7430 DPI display controller used in such devices as the
+ LMS397KF04 480x800 DPI panel.
+
 config DRM_PANEL_SAMSUNG_S6D16D0
tristate "Samsung S6D16D0 DSI video mode panel"
depends on OF
diff --git a/drivers/gpu/drm/panel/Makefile b/drivers/gpu/drm/panel/Makefile
index cae4d976c069..a350e0990d17 100644
--- a/drivers/gpu/drm/panel/Makefile
+++ b/drivers/gpu/drm/panel/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN) += 
panel-raspberrypi-touchscreen
 obj-$(CONFIG_DRM_PANEL_RAYDIUM_RM67191) += panel-raydium-rm67191.o
 obj-$(CONFIG_DRM_PANEL_RAYDIUM_RM68200) += panel-raydium-rm68200.o
 obj-$(CONFIG_DRM_PANEL_RONBO_RB070D30) += panel-ronbo-rb070d30.o
+obj-$(CONFIG_DRM_PANEL_SAMSUNG_DB7430) += panel-samsung-db7430.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_LD9040) += panel-samsung-ld9040.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6D16D0) += panel-samsung-s6d16d0.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6E3HA2) += panel-samsung-s6e3ha2.o
diff --git a/drivers/gpu/drm/panel/panel-samsung-db7430.c 
b/drivers/gpu/drm/panel/panel-samsung-db7430.c
new file mode 100644
index ..fe58263bd9cd
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-samsung-db7430.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Panel driver for the Samsung LMS397KF04 480x800 DPI RGB panel.
+ * According to the data sheet the display controller is called DB7430.
+ * Found in the Samsung Galaxy Beam GT-I8350 mobile phone.
+ * Lin

Re: [PATCH v4] drm/panel: db7430: Add driver for Samsung DB7430

2021-06-10 Thread Linus Walleij
Hi Noralf,

thanks for the review. Doug poked me with something sharp
until I finally complied and started to use the DBI library.
Now I have to convert the other 9bpw DBI type displays
I have.

On Thu, Jun 10, 2021 at 6:15 PM Noralf Trønnes  wrote:

> > + /** @reset: reset GPIO line */
> > + struct gpio_desc *reset;
>
> You can use dbi->reset.

I fixed all except this: it just feels weird to use the struct mipi_dbi for
storing the reset without the DBI core making any operations on it,
I think this and the regulator should possibly be moved over to
struct mipi_dbi_dev where the regulator is.

Yours,
Linus Walleij


Re: [PATCH 2/3] drm/i915/uapi: Add query for hwconfig table

2021-06-10 Thread Matthew Brost
On Thu, Jun 10, 2021 at 01:46:25PM -0700, john.c.harri...@intel.com wrote:
> From: Rodrigo Vivi 
> 
> GuC contains a consolidated table with a bunch of information about the
> current device.
> 
> Previously, this information was spread and hardcoded to all the components
> including GuC, i915 and various UMDs. The goal here is to consolidate
> the data into GuC in a way that all interested components can grab the
> very latest and synchronized information using a simple query.
> 
> As per most of the other queries, this one can be called twice.
> Once with item.length=0 to determine the exact buffer size, then
> allocate the user memory and call it again for to retrieve the
> table data. For example:
>   struct drm_i915_query_item item = {
> .query_id = DRM_I915_QUERY_HWCONCFIG_TABLE;
>   };
>   query.items_ptr = (int64_t) &item;
>   query.num_items = 1;
> 
>   ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));
> 
>   if (item.length <= 0)
> return -ENOENT;
> 
>   data = malloc(item.length);
>   item.data_ptr = (int64_t) &data;
>   ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));
> 
>   // Parse the data as appropriate...
> 
> The returned array is a simple and flexible KLV (Key/Length/Value)
> formatted table. For example, it could be just:
>   enum device_attr {
>  ATTR_SOME_VALUE = 0,
>  ATTR_SOME_MASK  = 1,
>   };
> 
>   static const u32 hwconfig[] = {
>   ATTR_SOME_VALUE,
>   1, // Value Length in DWords
>   8, // Value
> 
>   ATTR_SOME_MASK,
>   3,
>   0x00, 0x, 0xFF00,
>   };
> 
> The attribute ids are defined in a hardware spec. The current list as
> known to the i915 driver can be found in i915/gt/intel_guc_hwconfig_types.h
> 
> Cc: Tvrtko Ursulin 
> Cc: Kenneth Graunke 
> Cc: Michal Wajdeczko 
> Cc: Slawomir Milczarek 
> Signed-off-by: Rodrigo Vivi 
> Signed-off-by: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  drivers/gpu/drm/i915/i915_query.c | 23 +++
>  include/uapi/drm/i915_drm.h   |  1 +
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index e49da36c62fb..96bd8fb3e895 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -480,12 +480,35 @@ static int query_memregion_info(struct drm_i915_private 
> *i915,
>   return total_length;
>  }
>  
> +static int query_hwconfig_table(struct drm_i915_private *i915,
> + struct drm_i915_query_item *query_item)
> +{
> + struct intel_gt *gt = &i915->gt;
> + struct intel_guc_hwconfig *hwconfig = >->uc.guc.hwconfig;
> +
> + if (!hwconfig->size || !hwconfig->ptr)
> + return -ENODEV;
> +
> + if (query_item->length == 0)
> + return hwconfig->size;
> +
> + if (query_item->length < hwconfig->size)
> + return -EINVAL;
> +
> + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
> +  hwconfig->ptr, hwconfig->size))
> + return -EFAULT;
> +
> + return hwconfig->size;
> +}
> +
>  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
>   struct drm_i915_query_item *query_item) 
> = {
>   query_topology_info,
>   query_engine_info,
>   query_perf_config,
>   query_memregion_info,
> + query_hwconfig_table,
>  };
>  
>  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *file)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index c2c7759b7d2e..87d369cae22a 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2233,6 +2233,7 @@ struct drm_i915_query_item {
>  #define DRM_I915_QUERY_ENGINE_INFO   2
>  #define DRM_I915_QUERY_PERF_CONFIG  3
>  #define DRM_I915_QUERY_MEMORY_REGIONS   4
> +#define DRM_I915_QUERY_HWCONFIG_TABLE   5
>  /* Must be kept compact -- no holes and well documented */
>  
>   /**
> -- 
> 2.25.1
> 


[PATCH] drm: set DRM_RENDER_ALLOW flag on DRM_IOCTL_MODE_CREATE/DESTROY_DUMB ioctls

2021-06-10 Thread Dongwon Kim
Render clients should be able to create/destroy dumb object to import
and use it as render buffer in case the default DRM device is different
from the render device (i.e. kmsro).

Signed-off-by: Dongwon Kim 
---
 drivers/gpu/drm/drm_ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 98ae00661656..f2f72e132741 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -685,9 +685,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_MODE_RMFB, drm_mode_rmfb_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_PAGE_FLIP, drm_mode_page_flip_ioctl, 
DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_DIRTYFB, drm_mode_dirtyfb_ioctl, 
DRM_MASTER),
-   DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_DUMB, drm_mode_create_dumb_ioctl, 
0),
+   DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_DUMB, drm_mode_create_dumb_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_MAP_DUMB, drm_mode_mmap_dumb_ioctl, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_MODE_DESTROY_DUMB, drm_mode_destroy_dumb_ioctl, 
0),
+   DRM_IOCTL_DEF(DRM_IOCTL_MODE_DESTROY_DUMB, drm_mode_destroy_dumb_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_OBJ_GETPROPERTIES, 
drm_mode_obj_get_properties_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_OBJ_SETPROPERTY, 
drm_mode_obj_set_property_ioctl, DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CURSOR2, drm_mode_cursor2_ioctl, 
DRM_MASTER),
-- 
2.20.1



Re: [PATCH 1/3] drm/i915/guc: Add fetch of hwconfig table

2021-06-10 Thread Matthew Brost
On Thu, Jun 10, 2021 at 01:46:24PM -0700, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> Implement support for fetching the hardware description table from the
> GuC. The call is made twice - once without a destination buffer to
> query the size and then a second time to fill in the buffer.
> 
> This patch also adds a header file which lists all the attribute values
> currently defined for the table. This is included for reference as
> these are not currently used by the i915 driver itself.
> 
> Note that the table is only available on ADL-P and later platforms.
> 
> Cc: Michal Wajdeczko 
> Signed-off-by: Rodrigo Vivi 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  .../gpu/drm/i915/gt/intel_hwconfig_types.h| 102 +++
>  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
>  .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c|   3 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
>  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 167 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 ++
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c |   6 +
>  9 files changed, 304 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 2adb6b420c7c..8e957ca7c9f1 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -187,6 +187,7 @@ i915-y += gt/uc/intel_uc.o \
> gt/uc/intel_guc_log.o \
> gt/uc/intel_guc_log_debugfs.o \
> gt/uc/intel_guc_submission.o \
> +   gt/uc/intel_guc_hwconfig.o \
> gt/uc/intel_huc.o \
> gt/uc/intel_huc_debugfs.o \
> gt/uc/intel_huc_fw.o
> diff --git a/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h 
> b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
> new file mode 100644
> index ..b09c0f65b93a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2020 Intel Corporation
> + */
> +
> +#ifndef _INTEL_HWCONFIG_TYPES_H_
> +#define _INTEL_HWCONFIG_TYPES_H_
> +
> +/**
> + * enum intel_hwconfig - Global definition of hwconfig table attributes
> + *
> + * Intel devices provide a KLV (Key/Length/Value) table containing
> + * the static hardware configuration for that platform.
> + * This header defines the current attribute keys for this KLV.
> + */
> +enum intel_hwconfig {
> + INTEL_HWCONFIG_MAX_SLICES_SUPPORTED = 1,
> + INTEL_HWCONFIG_MAX_DUAL_SUBSLICES_SUPPORTED,/* 2 */
> + INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS,  /* 3 */
> + INTEL_HWCONFIG_NUM_PIXEL_PIPES, /* 4 */
> + INTEL_HWCONFIG_DEPRECATED_MAX_NUM_GEOMETRY_PIPES,   /* 5 */
> + INTEL_HWCONFIG_DEPRECATED_L3_CACHE_SIZE_IN_KB,  /* 6 */
> + INTEL_HWCONFIG_DEPRECATED_L3_BANK_COUNT,/* 7 */
> + INTEL_HWCONFIG_L3_CACHE_WAYS_SIZE_IN_BYTES, /* 8 */
> + INTEL_HWCONFIG_L3_CACHE_WAYS_PER_SECTOR,/* 9 */
> + INTEL_HWCONFIG_MAX_MEMORY_CHANNELS, /* 10 */
> + INTEL_HWCONFIG_MEMORY_TYPE, /* 11 */
> + INTEL_HWCONFIG_CACHE_TYPES, /* 12 */
> + INTEL_HWCONFIG_LOCAL_MEMORY_PAGE_SIZES_SUPPORTED,   /* 13 */
> + INTEL_HWCONFIG_DEPRECATED_SLM_SIZE_IN_KB,   /* 14 */
> + INTEL_HWCONFIG_NUM_THREADS_PER_EU,  /* 15 */
> + INTEL_HWCONFIG_TOTAL_VS_THREADS,/* 16 */
> + INTEL_HWCONFIG_TOTAL_GS_THREADS,/* 17 */
> + INTEL_HWCONFIG_TOTAL_HS_THREADS,/* 18 */
> + INTEL_HWCONFIG_TOTAL_DS_THREADS,/* 19 */
> + INTEL_HWCONFIG_TOTAL_VS_THREADS_POCS,   /* 20 */
> + INTEL_HWCONFIG_TOTAL_PS_THREADS,/* 21 */
> + INTEL_HWCONFIG_DEPRECATED_MAX_FILL_RATE,/* 22 */
> + INTEL_HWCONFIG_MAX_RCS, /* 23 */
> + INTEL_HWCONFIG_MAX_CCS, /* 24 */
> + INTEL_HWCONFIG_MAX_VCS, /* 25 */
> + INTEL_HWCONFIG_MAX_VECS,/* 26 */
> + INTEL_HWCONFIG_MAX_COPY_CS, /* 27 */
> + INTEL_HWCONFIG_DEPRECATED_URB_SIZE_IN_KB,   /* 28 */
> + INTEL_HWCONFIG_MIN_VS_URB_ENTRIES,  /* 29 */
> + INTEL_HWCONFIG_MAX_VS_URB_ENTRIES,  /* 30 */
> + INTEL_HWCONFIG_MIN_PCS_URB_ENTRIES, /* 31 */
> + INTEL_HWCONFIG_MAX_PCS

[PATCH v5 5/5] drm/msm: devcoredump iommu fault support

2021-06-10 Thread Rob Clark
From: Rob Clark 

Wire up support to stall the SMMU on iova fault, and collect a devcore-
dump snapshot for easier debugging of faults.

Currently this is a6xx-only, but mostly only because so far it is the
only one using adreno-smmu-priv.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 19 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 38 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++
 drivers/gpu/drm/msm/msm_gem.h   |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c|  1 +
 drivers/gpu/drm/msm/msm_gpu.c   | 48 +
 drivers/gpu/drm/msm/msm_gpu.h   | 17 
 drivers/gpu/drm/msm/msm_gpummu.c|  5 +++
 drivers/gpu/drm/msm/msm_iommu.c | 11 +
 drivers/gpu/drm/msm/msm_mmu.h   |  1 +
 11 files changed, 186 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index eb030b00bff4..7a271de9a212 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1200,6 +1200,15 @@ static void a5xx_fault_detect_irq(struct msm_gpu *gpu)
struct drm_device *dev = gpu->dev;
struct msm_ringbuffer *ring = gpu->funcs->active_ring(gpu);
 
+   /*
+* If stalled on SMMU fault, we could trip the GPU's hang detection,
+* but the fault handler will trigger the devcore dump, and we want
+* to otherwise resume normally rather than killing the submit, so
+* just bail.
+*/
+   if (gpu_read(gpu, REG_A5XX_RBBM_STATUS3) & BIT(24))
+   return;
+
DRM_DEV_ERROR(dev->dev, "gpu fault ring %d fence %x status %8.8X rb 
%4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
ring ? ring->id : -1, ring ? ring->seqno : 0,
gpu_read(gpu, REG_A5XX_RBBM_STATUS),
@@ -1523,6 +1532,7 @@ static struct msm_gpu_state *a5xx_gpu_state_get(struct 
msm_gpu *gpu)
 {
struct a5xx_gpu_state *a5xx_state = kzalloc(sizeof(*a5xx_state),
GFP_KERNEL);
+   bool stalled = !!(gpu_read(gpu, REG_A5XX_RBBM_STATUS3) & BIT(24));
 
if (!a5xx_state)
return ERR_PTR(-ENOMEM);
@@ -1535,8 +1545,13 @@ static struct msm_gpu_state *a5xx_gpu_state_get(struct 
msm_gpu *gpu)
 
a5xx_state->base.rbbm_status = gpu_read(gpu, REG_A5XX_RBBM_STATUS);
 
-   /* Get the HLSQ regs with the help of the crashdumper */
-   a5xx_gpu_state_get_hlsq_regs(gpu, a5xx_state);
+   /*
+* Get the HLSQ regs with the help of the crashdumper, but only if
+* we are not stalled in an iommu fault (in which case the crashdumper
+* would not have access to memory)
+*/
+   if (!stalled)
+   a5xx_gpu_state_get_hlsq_regs(gpu, a5xx_state);
 
a5xx_set_hwcg(gpu, true);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index fc19db10bff1..c3699408bd1f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1081,6 +1081,16 @@ static int a6xx_fault_handler(void *arg, unsigned long 
iova, int flags, void *da
struct msm_gpu *gpu = arg;
struct adreno_smmu_fault_info *info = data;
const char *type = "UNKNOWN";
+   const char *block;
+   bool do_devcoredump = info && !READ_ONCE(gpu->crashstate);
+
+   /*
+* If we aren't going to be resuming later from fault_worker, then do
+* it now.
+*/
+   if (!do_devcoredump) {
+   gpu->aspace->mmu->funcs->resume_translation(gpu->aspace->mmu);
+   }
 
/*
 * Print a default message if we couldn't get the data from the
@@ -1104,15 +1114,30 @@ static int a6xx_fault_handler(void *arg, unsigned long 
iova, int flags, void *da
else if (info->fsr & ARM_SMMU_FSR_EF)
type = "EXTERNAL";
 
+   block = a6xx_fault_block(gpu, info->fsynr1 & 0xff);
+
pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s 
type=%s source=%s (%u,%u,%u,%u)\n",
info->ttbr0, iova,
-   flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", type,
-   a6xx_fault_block(gpu, info->fsynr1 & 0xff),
+   flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ",
+   type, block,
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
 
+   if (do_devcoredump) {
+   /* Turn off the hangcheck timer to keep it from bothering us */
+   del_timer(&gpu->hangcheck_timer);
+
+   gpu->fault_info.ttbr0 = info->ttbr0;
+   

[PATCH v5 4/5] iommu/arm-smmu-qcom: Add stall support

2021-06-10 Thread Rob Clark
From: Rob Clark 

Add, via the adreno-smmu-priv interface, a way for the GPU to request
the SMMU to stall translation on faults, and then later resume the
translation, either retrying or terminating the current translation.

This will be used on the GPU side to "freeze" the GPU while we snapshot
useful state for devcoredump.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 33 ++
 include/linux/adreno-smmu-priv.h   |  7 +
 2 files changed, 40 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index b2e31ea84128..61fc645c1325 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -13,6 +13,7 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
bool bypass_quirk;
u8 bypass_cbndx;
+   u32 stall_enabled;
 };
 
 static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
@@ -23,12 +24,17 @@ static struct qcom_smmu *to_qcom_smmu(struct 
arm_smmu_device *smmu)
 static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
u32 reg)
 {
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+
/*
 * On the GPU device we want to process subsequent transactions after a
 * fault to keep the GPU from hanging
 */
reg |= ARM_SMMU_SCTLR_HUPCF;
 
+   if (qsmmu->stall_enabled & BIT(idx))
+   reg |= ARM_SMMU_SCTLR_CFCFG;
+
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
@@ -48,6 +54,31 @@ static void qcom_adreno_smmu_get_fault_info(const void 
*cookie,
info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, 
ARM_SMMU_CB_CONTEXTIDR);
 }
 
+static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
+
+   if (enabled)
+   qsmmu->stall_enabled |= BIT(cfg->cbndx);
+   else
+   qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
+}
+
+static void qcom_adreno_smmu_resume_translation(const void *cookie, bool 
terminate)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   u32 reg = 0;
+
+   if (terminate)
+   reg |= ARM_SMMU_RESUME_TERMINATE;
+
+   arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0
 
 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -173,6 +204,8 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
+   priv->set_stall = qcom_adreno_smmu_set_stall;
+   priv->resume_translation = qcom_adreno_smmu_resume_translation;
 
return 0;
 }
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index 53fe32fb9214..c637e0997f6d 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -45,6 +45,11 @@ struct adreno_smmu_fault_info {
  * TTBR0 translation is enabled with the specified cfg
  * @get_fault_info: Called by the GPU fault handler to get information about
  *  the fault
+ * @set_stall: Configure whether stall on fault (CFCFG) is enabled.  Call
+ * before set_ttbr0_cfg().  If stalling on fault is enabled,
+ * the GPU driver must call resume_translation()
+ * @resume_translation: Resume translation after a fault
+ *
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -60,6 +65,8 @@ struct adreno_smmu_priv {
 const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
 int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
 void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info 
*info);
+void (*set_stall)(const void *cookie, bool enabled);
+void (*resume_translation)(const void *cookie, bool terminate);
 };
 
 #endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.31.1



[PATCH v5 3/5] drm/msm: Improve the a6xx page fault handler

2021-06-10 Thread Rob Clark
From: Jordan Crouse 

Use the new adreno-smmu-priv fault info function to get more SMMU
debug registers and print the current TTBR0 to debug per-instance
pagetables and figure out which GPU block generated the request.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 76 +--
 drivers/gpu/drm/msm/msm_iommu.c   | 11 +++-
 drivers/gpu/drm/msm/msm_mmu.h |  4 +-
 4 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index f46562c12022..eb030b00bff4 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1075,7 +1075,7 @@ bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer 
*ring)
return true;
 }
 
-static int a5xx_fault_handler(void *arg, unsigned long iova, int flags)
+static int a5xx_fault_handler(void *arg, unsigned long iova, int flags, void 
*data)
 {
struct msm_gpu *gpu = arg;
pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d 
(%u,%u,%u,%u)\n",
@@ -1085,7 +1085,7 @@ static int a5xx_fault_handler(void *arg, unsigned long 
iova, int flags)
gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(6)),
gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(7)));
 
-   return -EFAULT;
+   return 0;
 }
 
 static void a5xx_cp_err_irq(struct msm_gpu *gpu)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c7f0ddb12d8f..fc19db10bff1 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1032,18 +1032,88 @@ static void a6xx_recover(struct msm_gpu *gpu)
msm_gpu_hw_init(gpu);
 }
 
-static int a6xx_fault_handler(void *arg, unsigned long iova, int flags)
+static const char *a6xx_uche_fault_block(struct msm_gpu *gpu, u32 mid)
+{
+   static const char *uche_clients[7] = {
+   "VFD", "SP", "VSC", "VPC", "HLSQ", "PC", "LRZ",
+   };
+   u32 val;
+
+   if (mid < 1 || mid > 3)
+   return "UNKNOWN";
+
+   /*
+* The source of the data depends on the mid ID read from FSYNR1.
+* and the client ID read from the UCHE block
+*/
+   val = gpu_read(gpu, REG_A6XX_UCHE_CLIENT_PF);
+
+   /* mid = 3 is most precise and refers to only one block per client */
+   if (mid == 3)
+   return uche_clients[val & 7];
+
+   /* For mid=2 the source is TP or VFD except when the client id is 0 */
+   if (mid == 2)
+   return ((val & 7) == 0) ? "TP" : "TP|VFD";
+
+   /* For mid=1 just return "UCHE" as a catchall for everything else */
+   return "UCHE";
+}
+
+static const char *a6xx_fault_block(struct msm_gpu *gpu, u32 id)
+{
+   if (id == 0)
+   return "CP";
+   else if (id == 4)
+   return "CCU";
+   else if (id == 6)
+   return "CDP Prefetch";
+
+   return a6xx_uche_fault_block(gpu, id);
+}
+
+#define ARM_SMMU_FSR_TF BIT(1)
+#define ARM_SMMU_FSR_PFBIT(3)
+#define ARM_SMMU_FSR_EFBIT(4)
+
+static int a6xx_fault_handler(void *arg, unsigned long iova, int flags, void 
*data)
 {
struct msm_gpu *gpu = arg;
+   struct adreno_smmu_fault_info *info = data;
+   const char *type = "UNKNOWN";
 
-   pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d 
(%u,%u,%u,%u)\n",
+   /*
+* Print a default message if we couldn't get the data from the
+* adreno-smmu-priv
+*/
+   if (!info) {
+   pr_warn_ratelimited("*** gpu fault: iova=%.16lx flags=%d 
(%u,%u,%u,%u)\n",
iova, flags,
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
 
-   return -EFAULT;
+   return 0;
+   }
+
+   if (info->fsr & ARM_SMMU_FSR_TF)
+   type = "TRANSLATION";
+   else if (info->fsr & ARM_SMMU_FSR_PF)
+   type = "PERMISSION";
+   else if (info->fsr & ARM_SMMU_FSR_EF)
+   type = "EXTERNAL";
+
+   pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s 
type=%s source=%s (%u,%u,%u,%u)\n",
+   info->ttbr0, iova,
+   flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", type,
+   a6xx_fault_block(gpu, info->fsynr1 & 0xff),
+   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
+   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
+   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
+   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
+
+   return 0;
 }
 
 sta

[PATCH v5 2/5] iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info

2021-06-10 Thread Rob Clark
From: Jordan Crouse 

Add a callback in adreno-smmu-priv to read interesting SMMU
registers to provide an opportunity for a richer debug experience
in the GPU driver.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 17 
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  2 ++
 include/linux/adreno-smmu-priv.h   | 31 +-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 98b3a1c2a181..b2e31ea84128 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -32,6 +32,22 @@ static void qcom_adreno_smmu_write_sctlr(struct 
arm_smmu_device *smmu, int idx,
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+static void qcom_adreno_smmu_get_fault_info(const void *cookie,
+   struct adreno_smmu_fault_info *info)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
+   info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
+   info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
+   info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
+   info->cbfrsynra = arm_smmu_gr1_read(smmu, 
ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
+   info->ttbr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
+   info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, 
ARM_SMMU_CB_CONTEXTIDR);
+}
+
 #define QCOM_ADRENO_SMMU_GPU_SID 0
 
 static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
@@ -156,6 +172,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
priv->cookie = smmu_domain;
priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+   priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
 
return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index c31a59d35c64..84c21c4b0691 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -224,6 +224,8 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_FSYNR0_WNRBIT(4)
 
+#define ARM_SMMU_CB_FSYNR1 0x6c
+
 #define ARM_SMMU_CB_S1_TLBIVA  0x600
 #define ARM_SMMU_CB_S1_TLBIASID0x610
 #define ARM_SMMU_CB_S1_TLBIVAL 0x620
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index a889f28afb42..53fe32fb9214 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -8,6 +8,32 @@
 
 #include 
 
+/**
+ * struct adreno_smmu_fault_info - container for key fault information
+ *
+ * @far: The faulting IOVA from ARM_SMMU_CB_FAR
+ * @ttbr0: The current TTBR0 pagetable from ARM_SMMU_CB_TTBR0
+ * @contextidr: The value of ARM_SMMU_CB_CONTEXTIDR
+ * @fsr: The fault status from ARM_SMMU_CB_FSR
+ * @fsynr0: The value of FSYNR0 from ARM_SMMU_CB_FSYNR0
+ * @fsynr1: The value of FSYNR1 from ARM_SMMU_CB_FSYNR0
+ * @cbfrsynra: The value of CBFRSYNRA from ARM_SMMU_GR1_CBFRSYNRA(idx)
+ *
+ * This struct passes back key page fault information to the GPU driver
+ * through the get_fault_info function pointer.
+ * The GPU driver can use this information to print informative
+ * log messages and provide deeper GPU specific insight into the fault.
+ */
+struct adreno_smmu_fault_info {
+   u64 far;
+   u64 ttbr0;
+   u32 contextidr;
+   u32 fsr;
+   u32 fsynr0;
+   u32 fsynr1;
+   u32 cbfrsynra;
+};
+
 /**
  * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
  *
@@ -17,6 +43,8 @@
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
  * NULL config disables TTBR0 translation, otherwise
  * TTBR0 translation is enabled with the specified cfg
+ * @get_fault_info: Called by the GPU fault handler to get information about
+ *  the fault
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
  * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
@@ -31,6 +59,7 @@ struct adreno_smmu_priv {
 const void *cookie;
 const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
 int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info 
*info);
 };
 
-#endif /* __ADRENO_SMMU_PRIV_H */
\ No newline at end of file
+#endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.31.1



[PATCH v5 1/5] iommu/arm-smmu: Add support for driver IOMMU fault handlers

2021-06-10 Thread Rob Clark
From: Jordan Crouse 

Call report_iommu_fault() to allow upper-level drivers to register their
own fault handlers.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Acked-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..b4b32d31fc06 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
int idx = smmu_domain->cfg.cbndx;
+   int ret;
 
fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
if (!(fsr & ARM_SMMU_FSR_FAULT))
@@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
 
-   dev_err_ratelimited(smmu->dev,
-   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   ret = report_iommu_fault(domain, NULL, iova,
+   fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : 
IOMMU_FAULT_READ);
+
+   if (ret == -ENOSYS)
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
fsr, iova, fsynr, cbfrsynra, idx);
 
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
-- 
2.31.1



[PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-06-10 Thread Rob Clark
From: Rob Clark 

This picks up an earlier series[1] from Jordan, and adds additional
support needed to generate GPU devcore dumps on iova faults.  Original
description:

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
GPU snapshotting needs to avoid crashdumper, and check the
RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
resume translation after it has had a chance to snapshot the GPUs
state
v3: Always clear FSR even if the target driver is going to handle resume
v2: Fix comment wording and function pointer check per Rob Clark

[1] 
https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
info
  drm/msm: Improve the a6xx page fault handler

Rob Clark (2):
  iommu/arm-smmu-qcom: Add stall support
  drm/msm: devcoredump iommu fault support

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
 drivers/gpu/drm/msm/msm_gem.h   |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
 drivers/gpu/drm/msm/msm_gpu.c   |  48 +
 drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
 drivers/gpu/drm/msm/msm_gpummu.c|   5 +
 drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
 drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
 include/linux/adreno-smmu-priv.h|  38 ++-
 15 files changed, 367 insertions(+), 21 deletions(-)

-- 
2.31.1



[PATCH 3/6] dma-buf: Document DMA_BUF_IOCTL_SYNC (v2)

2021-06-10 Thread Jason Ekstrand
This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
documentation for DMA_BUF_IOCTL_SYNC.

v2 (Daniel Vetter):
 - Fix a couple typos
 - Add commentary about synchronization with other devices
 - Use item list format for describing flags

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Cc: Christian König 
Cc: Sumit Semwal 
---
 Documentation/driver-api/dma-buf.rst |  8 +
 include/uapi/linux/dma-buf.h | 46 +++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 7f21425d9435a..0d4c13ec1a800 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -88,6 +88,9 @@ consider though:
 - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below 
for
   details.
 
+- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
+  `DMA Buffer ioctls`_ below for details.
+
 Basic Operation and Device DMA Access
 ~
 
@@ -106,6 +109,11 @@ Implicit Fence Poll Support
 .. kernel-doc:: drivers/dma-buf/dma-buf.c
:doc: implicit fence polling
 
+DMA Buffer ioctls
+~
+
+.. kernel-doc:: include/uapi/linux/dma-buf.h
+
 Kernel Functions and Structures Reference
 ~
 
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
index 7f30393b92c3b..1c131002fe1ee 100644
--- a/include/uapi/linux/dma-buf.h
+++ b/include/uapi/linux/dma-buf.h
@@ -22,8 +22,52 @@
 
 #include 
 
-/* begin/end dma-buf functions used for userspace mmap. */
+/**
+ * struct dma_buf_sync - Synchronize with CPU access.
+ *
+ * When a DMA buffer is accessed from the CPU via mmap, it is not always
+ * possible to guarantee coherency between the CPU-visible map and underlying
+ * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
+ * any CPU access to give the kernel the chance to shuffle memory around if
+ * needed.
+ *
+ * Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNC
+ * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
+ * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
+ * DMA_BUF_SYNC_END and the same read/write flags.
+ *
+ * The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cache
+ * coherency.  It does not prevent other processes or devices from
+ * accessing the memory at the same time.  If synchronization with a GPU or
+ * other device driver is required, it is the client's responsibility to
+ * wait for buffer to be ready for reading or writing.  If the driver or
+ * API with which the client is interacting uses implicit synchronization,
+ * this can be done via poll() on the DMA buffer file descriptor.  If the
+ * driver or API requires explicit synchronization, the client may have to
+ * wait on a sync_file or other synchronization primitive outside the scope
+ * of the DMA buffer API.
+ */
 struct dma_buf_sync {
+   /**
+* @flags: Set of access flags
+*
+* DMA_BUF_SYNC_START:
+* Indicates the start of a map access session.
+*
+* DMA_BUF_SYNC_END:
+* Indicates the end of a map access session.
+*
+* DMA_BUF_SYNC_READ:
+* Indicates that the mapped DMA buffer will be read by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_WRITE:
+* Indicates that the mapped DMA buffer will be written by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_RW:
+* An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.
+*/
__u64 flags;
 };
 
-- 
2.31.1



Re: [PATCH] drm/amd/display: Verify Gamma & Degamma LUT sizes in amdgpu_dm_atomic_check

2021-06-10 Thread Harry Wentland



On 2021-06-07 10:53 a.m., Mark Yacoub wrote:
> On Fri, Jun 4, 2021 at 4:17 PM Harry Wentland  wrote:
>>
>>
>>
>> On 2021-06-04 1:01 p.m., Mark Yacoub wrote:
>>> From: Mark Yacoub 
>>>
>>> For each CRTC state, check the size of Gamma and Degamma LUTs  so
>>> unexpected and larger sizes wouldn't slip through.
>>>
>>> TEST: IGT:kms_color::pipe-invalid-gamma-lut-sizes
>>>
>>> Signed-off-by: Mark Yacoub 
>>> Change-Id: I9d513a38e8ac2af1b4bf802e1feb1a4d726fba4c
>>> ---
>>>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++
>>>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  1 +
>>>  .../amd/display/amdgpu_dm/amdgpu_dm_color.c   | 40 ---
>>>  3 files changed, 38 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index 38d497d30dba8..f6cd522b42a80 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -9402,6 +9402,9 @@ static int amdgpu_dm_atomic_check(struct drm_device 
>>> *dev,
>>>   dm_old_crtc_state->dsc_force_changed == false)
>>>   continue;
>>>
>>> + if ((ret = amdgpu_dm_verify_lut_sizes(new_crtc_state)))
>>> + goto fail;
>>> +
>>>   if (!new_crtc_state->enable)
>>>   continue;
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> index 8bfe901cf2374..1b77cd2612691 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
>>> @@ -541,6 +541,7 @@ void amdgpu_dm_trigger_timing_sync(struct drm_device 
>>> *dev);
>>>  #define MAX_COLOR_LEGACY_LUT_ENTRIES 256
>>>
>>>  void amdgpu_dm_init_color_mod(void);
>>> +int amdgpu_dm_verify_lut_sizes(const struct drm_crtc_state *crtc_state);
>>>  int amdgpu_dm_update_crtc_color_mgmt(struct dm_crtc_state *crtc);
>>>  int amdgpu_dm_update_plane_color_mgmt(struct dm_crtc_state *crtc,
>>> struct dc_plane_state *dc_plane_state);
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
>>> index 157fe4efbb599..da6f9fcc0b415 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
>>> @@ -284,6 +284,37 @@ static int __set_input_tf(struct dc_transfer_func 
>>> *func,
>>>   return res ? 0 : -ENOMEM;
>>>  }
>>>
>>> +/**
>>> + * Verifies that the Degamma and Gamma LUTs attached to the |crtc_state| 
>>> are of
>>> + * the expected size.
>>> + * Returns 0 on success.
>>> + */
>>> +int amdgpu_dm_verify_lut_sizes(const struct drm_crtc_state *crtc_state)
>>> +{
>>> + const struct drm_color_lut *lut = NULL;
>>> + uint32_t size = 0;
>>> +
>>> + lut = __extract_blob_lut(crtc_state->degamma_lut, &size);
>>> + if (lut && size != MAX_COLOR_LUT_ENTRIES) {
>>
>> Isn't the point of the LUT size that it can be variable? Did you observe any
>> problems with LUTs that are not of size 4096?
> Is it supposed to be variable?
> I'm basing my knowledge of LUTs on this IGT Test:
> https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/kms_color_helper.c#L281>>
>  It does check for invalid sizes and for the exact size, giving me the
> impression that it's not too flexible.
> Is variability of size an AMD specific behavior or should it be a DRM 
> behavior?
>>
>> Legacy X-based userspace will give us 256 size LUTs. We can't break support 
>> for
>> that. See MAX_COLOR_LEGACY_LUT_ENTRIES.
> In the new function `amdgpu_dm_verify_lut_sizes`, I maintained parity
> with the old behavior. In `amdgpu_dm_update_crtc_color_mgmt`, the
> degamma size is only checked against `MAX_COLOR_LUT_ENTRIES` while
> regamma_size size is checked against both MAX_COLOR_LUT_ENTRIES and
> MAX_COLOR_LEGACY_LUT_ENTRIES:
> https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c#L321>>
>  Also, in the definition of MAX_COLOR_LEGACY_LUT_ENTRIES, it mentions
> "Legacy gamm[sic] LUT" not degamma:
> https://gitlab.freedesktop.org/agd5f/linux/-/blame/amd-staging-drm-next/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h#L616>>
>  As well as the commit when it was introduced, it seems to be handling
> gammas rather than degamma LUTs:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/086247a4b2fba49800b27807f22bb894cd8363fb>>
>  Let me know if this would be a bug in the old behavior and I can fix
> it, or if i'm missing something.

Ah, yes, you're right, of course. Thanks for walking me through it. :)

Reviewed-by: Harry Wentland 

Harry

>>
>> Harry
>>
>>> + DRM_DEBUG_DRIVER(
>>> + "Invalid Degamma LUT size. Should be %u but got 
>>> %u.\n",
>>> +  

[PATCH 5/6] RFC: dma-buf: Add an extra fence to dma_resv_get_singleton_unlocked

2021-06-10 Thread Jason Ekstrand
For dma-buf sync_file import, we want to get all the fences on a
dma_resv plus one more.  We could wrap the fence we get back in an array
fence or we could make dma_resv_get_singleton_unlocked take "one more"
to make this case easier.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Cc: Christian König 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c  |  2 +-
 drivers/dma-buf/dma-resv.c | 23 +--
 include/linux/dma-resv.h   |  3 ++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 41b14b53cdda3..831828d71b646 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -389,7 +389,7 @@ static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
return fd;
 
if (arg.flags & DMA_BUF_SYNC_WRITE) {
-   fence = dma_resv_get_singleton(dmabuf->resv);
+   fence = dma_resv_get_singleton(dmabuf->resv, NULL);
if (IS_ERR(fence)) {
ret = PTR_ERR(fence);
goto err_put_fd;
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 1b26aa7e5d81c..7c48c23239b4b 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -504,6 +504,7 @@ EXPORT_SYMBOL_GPL(dma_resv_get_fences);
 /**
  * dma_resv_get_singleton - get a single fence for the dma_resv object
  * @obj: the reservation object
+ * @extra: extra fence to add to the resulting array
  *
  * Get a single fence representing all unsignaled fences in the dma_resv object
  * plus the given extra fence. If we got only one fence return a new
@@ -512,7 +513,8 @@ EXPORT_SYMBOL_GPL(dma_resv_get_fences);
  * RETURNS
  * The singleton dma_fence on success or an ERR_PTR on failure
  */
-struct dma_fence *dma_resv_get_singleton(struct dma_resv *obj)
+struct dma_fence *dma_resv_get_singleton(struct dma_resv *obj,
+struct dma_fence *extra)
 {
struct dma_fence *result, **resv_fences, *fence, *chain, **fences;
struct dma_fence_array *array;
@@ -523,7 +525,7 @@ struct dma_fence *dma_resv_get_singleton(struct dma_resv 
*obj)
if (err)
return ERR_PTR(err);
 
-   if (num_resv_fences == 0)
+   if (num_resv_fences == 0 && !extra)
return NULL;
 
num_fences = 0;
@@ -539,6 +541,16 @@ struct dma_fence *dma_resv_get_singleton(struct dma_resv 
*obj)
}
}
 
+   if (extra) {
+   dma_fence_deep_dive_for_each(fence, chain, j, extra) {
+   if (dma_fence_is_signaled(fence))
+   continue;
+
+   result = fence;
+   ++num_fences;
+   }
+   }
+
if (num_fences <= 1) {
result = dma_fence_get(result);
goto put_resv_fences;
@@ -559,6 +571,13 @@ struct dma_fence *dma_resv_get_singleton(struct dma_resv 
*obj)
}
}
 
+   if (extra) {
+   dma_fence_deep_dive_for_each(fence, chain, j, extra) {
+   if (dma_fence_is_signaled(fence))
+   fences[num_fences++] = dma_fence_get(fence);
+   }
+   }
+
if (num_fences <= 1) {
result = num_fences ? fences[0] : NULL;
kfree(fences);
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index d60982975a786..f970e03fc1a08 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -275,7 +275,8 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct 
dma_fence *fence);
 int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl,
unsigned *pshared_count, struct dma_fence ***pshared);
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
-struct dma_fence *dma_resv_get_singleton(struct dma_resv *obj);
+struct dma_fence *dma_resv_get_singleton(struct dma_resv *obj,
+struct dma_fence *extra);
 long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr,
   unsigned long timeout);
 bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all);
-- 
2.31.1



[PATCH 6/6] RFC: dma-buf: Add an API for importing sync files (v7)

2021-06-10 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

v7 (Daniel Vetter):
 - Fix the uapi header to use the right struct in the ioctl
 - Use a separate dma_buf_import_sync_file struct
 - Add kerneldoc for dma_buf_import_sync_file

Signed-off-by: Jason Ekstrand 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 36 
 include/uapi/linux/dma-buf.h | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 831828d71b646..88afd723015a2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -422,6 +422,40 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
+const void __user *user_data)
+{
+   struct dma_buf_import_sync_file arg;
+   struct dma_fence *fence, *singleton = NULL;
+   int ret = 0;
+
+   if (copy_from_user(&arg, user_data, sizeof(arg)))
+   return -EFAULT;
+
+   if (arg.flags != DMA_BUF_SYNC_RW)
+   return -EINVAL;
+
+   fence = sync_file_get_fence(arg.fd);
+   if (!fence)
+   return -EINVAL;

[PATCH 4/6] dma-buf: Add an API for exporting sync files (v12)

2021-06-10 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over-synchronization.

By making this an ioctl on the dma-buf itself, it allows this new
functionality to be used in an entirely driver-agnostic way without
having access to a DRM fd. This makes it ideal for use in driver-generic
code in Mesa or in a client such as a compositor where the DRM fd may be
hard to reach.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

v11 (Daniel Vetter):
 - Go back to dma_buf_export_sync_file as the ioctl struct name
 - Better kerneldoc describing what the read/write flags do

v12 (Christian König):
 - Document why we chose to make it an ioctl on dma-buf

Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
Acked-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 67 
 include/uapi/linux/dma-buf.h | 35 +++
 2 files changed, 102 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 511fe0d217a08..41b14b53cdda3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,6 +192,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
  * Note that this only signals the completion of the respective fences, i.e. 
the
  * DMA transfers are complete. Cache flushing and any other necessary
  * preparations before CPU access can begin still need to happen.
+ *
+ * As an alternative to poll(), the set of fences on DMA buffer can be
+ * exported as a &sync_file using &dma_buf_sync_file_export.
  */
 
 static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -362,6 +366,64 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return ret;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
+void __user *user_data)
+{
+   struct dma_buf_export_sync_file arg;
+   struct dma_fence *fence = NULL;
+   struct sync_file *sync_file;
+   int fd, ret;
+
+   if (copy_from_user(&arg, user_data, sizeof(arg)))
+   r

[PATCH 3/6] dma-buf: Document DMA_BUF_IOCTL_SYNC (v2)

2021-06-10 Thread Jason Ekstrand
This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
documentation for DMA_BUF_IOCTL_SYNC.

v2 (Daniel Vetter):
 - Fix a couple typos
 - Add commentary about synchronization with other devices
 - Use item list format for describing flags

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Cc: Christian König 
Cc: Sumit Semwal 
---
 Documentation/driver-api/dma-buf.rst |  8 +
 include/uapi/linux/dma-buf.h | 46 +++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 7f21425d9435a..0d4c13ec1a800 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -88,6 +88,9 @@ consider though:
 - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below 
for
   details.
 
+- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
+  `DMA Buffer ioctls`_ below for details.
+
 Basic Operation and Device DMA Access
 ~
 
@@ -106,6 +109,11 @@ Implicit Fence Poll Support
 .. kernel-doc:: drivers/dma-buf/dma-buf.c
:doc: implicit fence polling
 
+DMA Buffer ioctls
+~
+
+.. kernel-doc:: include/uapi/linux/dma-buf.h
+
 Kernel Functions and Structures Reference
 ~
 
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
index 7f30393b92c3b..1c131002fe1ee 100644
--- a/include/uapi/linux/dma-buf.h
+++ b/include/uapi/linux/dma-buf.h
@@ -22,8 +22,52 @@
 
 #include 
 
-/* begin/end dma-buf functions used for userspace mmap. */
+/**
+ * struct dma_buf_sync - Synchronize with CPU access.
+ *
+ * When a DMA buffer is accessed from the CPU via mmap, it is not always
+ * possible to guarantee coherency between the CPU-visible map and underlying
+ * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
+ * any CPU access to give the kernel the chance to shuffle memory around if
+ * needed.
+ *
+ * Prior to accessing the map, the client must call DMA_BUF_IOCTL_SYNC
+ * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
+ * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
+ * DMA_BUF_SYNC_END and the same read/write flags.
+ *
+ * The synchronization provided via DMA_BUF_IOCTL_SYNC only provides cache
+ * coherency.  It does not prevent other processes or devices from
+ * accessing the memory at the same time.  If synchronization with a GPU or
+ * other device driver is required, it is the client's responsibility to
+ * wait for buffer to be ready for reading or writing.  If the driver or
+ * API with which the client is interacting uses implicit synchronization,
+ * this can be done via poll() on the DMA buffer file descriptor.  If the
+ * driver or API requires explicit synchronization, the client may have to
+ * wait on a sync_file or other synchronization primitive outside the scope
+ * of the DMA buffer API.
+ */
 struct dma_buf_sync {
+   /**
+* @flags: Set of access flags
+*
+* DMA_BUF_SYNC_START:
+* Indicates the start of a map access session.
+*
+* DMA_BUF_SYNC_END:
+* Indicates the end of a map access session.
+*
+* DMA_BUF_SYNC_READ:
+* Indicates that the mapped DMA buffer will be read by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_WRITE:
+* Indicates that the mapped DMA buffer will be written by the
+* client via the CPU map.
+*
+* DMA_BUF_SYNC_RW:
+* An alias for DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE.
+*/
__u64 flags;
 };
 
-- 
2.31.1



[PATCH 2/6] dma-buf: Add dma_resv_get_singleton (v6)

2021-06-10 Thread Jason Ekstrand
Add a helper function to get a single fence representing
all fences in a dma_resv object.

This fence is either the only one in the object or all not
signaled fences of the object in a flatted out dma_fence_array.

v2 (Jason Ekstrand):
 - Take reference of fences both for creating the dma_fence_array and in
   the case where we return one fence.
 - Handle the case where dma_resv_get_list() returns NULL

v3 (Jason Ekstrand):
 - Add an _rcu suffix because it is read-only
 - Rewrite to use dma_resv_get_fences_rcu so it's RCU-safe
 - Add an EXPORT_SYMBOL_GPL declaration
 - Re-author the patch to Jason since very little is left of Christian
   König's original patch
 - Remove the extra fence argument

v4 (Jason Ekstrand):
 - Restore the extra fence argument

v5 (Daniel Vetter):
 - Rename from _rcu to _unlocked since it doesn't leak RCU details to
   the caller
 - Fix docs
 - Use ERR_PTR for error handling rather than an output dma_fence**

v5 (Jason Ekstrand):
 - Drop the extra fence param and leave that to a separate patch

v6 (Jason Ekstrand):
 - Rename to dma_resv_get_singleton to match the new naming convention
   for dma_resv helpers which work without taking a lock.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Daniel Vetter 
Cc: Christian König 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-resv.c | 91 ++
 include/linux/dma-resv.h   |  1 +
 2 files changed, 92 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index f26c71747d43a..1b26aa7e5d81c 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -34,6 +34,8 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -50,6 +52,10 @@
  * write-side updates.
  */
 
+#define dma_fence_deep_dive_for_each(fence, chain, index, head)\
+   dma_fence_chain_for_each(chain, head)   \
+   dma_fence_array_for_each(fence, index, chain)
+
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
@@ -495,6 +501,91 @@ int dma_resv_get_fences(struct dma_resv *obj, struct 
dma_fence **pfence_excl,
 }
 EXPORT_SYMBOL_GPL(dma_resv_get_fences);
 
+/**
+ * dma_resv_get_singleton - get a single fence for the dma_resv object
+ * @obj: the reservation object
+ *
+ * Get a single fence representing all unsignaled fences in the dma_resv object
+ * plus the given extra fence. If we got only one fence return a new
+ * reference to that, otherwise return a dma_fence_array object.
+ *
+ * RETURNS
+ * The singleton dma_fence on success or an ERR_PTR on failure
+ */
+struct dma_fence *dma_resv_get_singleton(struct dma_resv *obj)
+{
+   struct dma_fence *result, **resv_fences, *fence, *chain, **fences;
+   struct dma_fence_array *array;
+   unsigned int num_resv_fences, num_fences;
+   unsigned int err, i, j;
+
+   err = dma_resv_get_fences(obj, NULL, &num_resv_fences, &resv_fences);
+   if (err)
+   return ERR_PTR(err);
+
+   if (num_resv_fences == 0)
+   return NULL;
+
+   num_fences = 0;
+   result = NULL;
+
+   for (i = 0; i < num_resv_fences; ++i) {
+   dma_fence_deep_dive_for_each(fence, chain, j, resv_fences[i]) {
+   if (dma_fence_is_signaled(fence))
+   continue;
+
+   result = fence;
+   ++num_fences;
+   }
+   }
+
+   if (num_fences <= 1) {
+   result = dma_fence_get(result);
+   goto put_resv_fences;
+   }
+
+   fences = kmalloc_array(num_fences, sizeof(struct dma_fence *),
+  GFP_KERNEL);
+   if (!fences) {
+   result = ERR_PTR(-ENOMEM);
+   goto put_resv_fences;
+   }
+
+   num_fences = 0;
+   for (i = 0; i < num_resv_fences; ++i) {
+   dma_fence_deep_dive_for_each(fence, chain, j, resv_fences[i]) {
+   if (!dma_fence_is_signaled(fence))
+   fences[num_fences++] = dma_fence_get(fence);
+   }
+   }
+
+   if (num_fences <= 1) {
+   result = num_fences ? fences[0] : NULL;
+   kfree(fences);
+   goto put_resv_fences;
+   }
+
+   array = dma_fence_array_create(num_fences, fences,
+  dma_fence_context_alloc(1),
+  1, false);
+   if (array) {
+   result = &array->base;
+   } else {
+   result = ERR_PTR(-ENOMEM);
+   while (num_fences--)
+   dma_fence_put(fences[num_fences]);
+   kfree(fences);
+   }
+
+put_resv_fences:
+   while (num_resv_fences--)
+   dma_fence_put(resv_fences[num_resv_fences]);
+   kfree(resv_fences);
+
+   return result;
+}
+EXPORT_SYMBOL_GPL(dma_resv_get_singleton);
+
 /**
  * dma_resv_wait_timeout - Wait on reservatio

[PATCH 1/6] dma-buf: Add dma_fence_array_for_each (v2)

2021-06-10 Thread Jason Ekstrand
From: Christian König 

Add a helper to iterate over all fences in a dma_fence_array object.

v2 (Jason Ekstrand)
 - Return NULL from dma_fence_array_first if head == NULL.  This matches
   the iterator behavior of dma_fence_chain_for_each in that it iterates
   zero times if head == NULL.
 - Return NULL from dma_fence_array_next if index > array->num_fences.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
Reviewed-by: Christian König 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-fence-array.c | 27 +++
 include/linux/dma-fence-array.h   | 17 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be944..2ac1afc697d0f 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -201,3 +201,30 @@ bool dma_fence_match_context(struct dma_fence *fence, u64 
context)
return true;
 }
 EXPORT_SYMBOL(dma_fence_match_context);
+
+struct dma_fence *dma_fence_array_first(struct dma_fence *head)
+{
+   struct dma_fence_array *array;
+
+   if (!head)
+   return NULL;
+
+   array = to_dma_fence_array(head);
+   if (!array)
+   return head;
+
+   return array->fences[0];
+}
+EXPORT_SYMBOL(dma_fence_array_first);
+
+struct dma_fence *dma_fence_array_next(struct dma_fence *head,
+  unsigned int index)
+{
+   struct dma_fence_array *array = to_dma_fence_array(head);
+
+   if (!array || index >= array->num_fences)
+   return NULL;
+
+   return array->fences[index];
+}
+EXPORT_SYMBOL(dma_fence_array_next);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 303dd712220fd..588ac8089dd61 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -74,6 +74,19 @@ to_dma_fence_array(struct dma_fence *fence)
return container_of(fence, struct dma_fence_array, base);
 }
 
+/**
+ * dma_fence_array_for_each - iterate over all fences in array
+ * @fence: current fence
+ * @index: index into the array
+ * @head: potential dma_fence_array object
+ *
+ * Test if @array is a dma_fence_array object and if yes iterate over all 
fences
+ * in the array. If not just iterate over the fence in @array itself.
+ */
+#define dma_fence_array_for_each(fence, index, head)   \
+   for (index = 0, fence = dma_fence_array_first(head); fence; \
+++(index), fence = dma_fence_array_next(head, index))
+
 struct dma_fence_array *dma_fence_array_create(int num_fences,
   struct dma_fence **fences,
   u64 context, unsigned seqno,
@@ -81,4 +94,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences,
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
+struct dma_fence *dma_fence_array_first(struct dma_fence *head);
+struct dma_fence *dma_fence_array_next(struct dma_fence *head,
+  unsigned int index);
+
 #endif /* __LINUX_DMA_FENCE_ARRAY_H */
-- 
2.31.1



[PATCH 0/6] dma-buf: Add an API for exporting sync files (v12)

2021-06-10 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over- synchronization.

This patch series actually contains two new ioctls.  There is the export
one mentioned above as well as an RFC for an import ioctl which provides
the other half.  The intention is to land the export ioctl since it seems
like there's no real disagreement on that one.  The import ioctl, however,
has a lot of debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

v12 (Daniel Vetter):
 - Better docs for DMA_BUF_IOCTL_SYNC

v12 (Christian König):
 - Drop the rename patch in favor of Christian's series
 - Add a comment to the commit message for the dma-buf sync_file export
   ioctl saying why we made it an ioctl on dma-buf

Cc: Christian König 
Cc: Michel Dänzer 
Cc: Dave Airlie 
Cc: Bas Nieuwenhuizen 
Cc: Daniel Stone 
Cc: mesa-...@lists.freedesktop.org
Cc: wayland-de...@lists.freedesktop.org
Test-with: 20210524205225.872316-1-ja...@jlekstrand.net

Christian König (1):
  dma-buf: Add dma_fence_array_for_each (v2)

Jason Ekstrand (5):
  dma-buf: Add dma_resv_get_singleton (v6)
  dma-buf: Document DMA_BUF_IOCTL_SYNC (v2)
  dma-buf: Add an API for exporting sync files (v12)
  RFC: dma-buf: Add an extra fence to dma_resv_get_singleton_unlocked
  RFC: dma-buf: Add an API for importing sync files (v7)

 Documentation/driver-api/dma-buf.rst |   8 ++
 drivers/dma-buf/dma-buf.c| 103 +
 drivers/dma-buf/dma-fence-array.c|  27 +++
 drivers/dma-buf/dma-resv.c   | 110 +++
 include/linux/dma-fence-array.h  |  17 +
 include/linux/dma-resv.h |   2 +
 include/uapi/linux/dma-buf.h | 103 -
 7 files changed, 369 insertions(+), 1 deletion(-)

-- 
2.31.1



Re: [PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 7:47 PM Maxime Ripard  wrote:
>
> New KMS properties come with a bunch of requirements to avoid each
> driver from running their own, inconsistent, set of properties,
> eventually leading to issues like property conflicts, inconsistencies
> between drivers and semantics, etc.
>
> Let's document what we expect.
>
> Cc: Alexandre Belloni 
> Cc: Alexandre Torgue 
> Cc: Alex Deucher 
> Cc: Alison Wang 
> Cc: Alyssa Rosenzweig 
> Cc: Andrew Jeffery 
> Cc: Andrzej Hajda 
> Cc: Anitha Chrisanthus 
> Cc: Benjamin Gaignard 
> Cc: Ben Skeggs 
> Cc: Boris Brezillon 
> Cc: Brian Starkey 
> Cc: Chen Feng 
> Cc: Chen-Yu Tsai 
> Cc: Christian Gmeiner 
> Cc: "Christian König" 
> Cc: Chun-Kuang Hu 
> Cc: Edmund Dea 
> Cc: Eric Anholt 
> Cc: Fabio Estevam 
> Cc: Gerd Hoffmann 
> Cc: Haneen Mohammed 
> Cc: Hans de Goede 
> Cc: "Heiko Stübner" 
> Cc: Huang Rui 
> Cc: Hyun Kwon 
> Cc: Inki Dae 
> Cc: Jani Nikula 
> Cc: Jernej Skrabec 
> Cc: Jerome Brunet 
> Cc: Joel Stanley 
> Cc: John Stultz 
> Cc: Jonas Karlman 
> Cc: Jonathan Hunter 
> Cc: Joonas Lahtinen 
> Cc: Joonyoung Shim 
> Cc: Jyri Sarha 
> Cc: Kevin Hilman 
> Cc: Kieran Bingham 
> Cc: Krzysztof Kozlowski 
> Cc: Kyungmin Park 
> Cc: Laurent Pinchart 
> Cc: Linus Walleij 
> Cc: Liviu Dudau 
> Cc: Lucas Stach 
> Cc: Ludovic Desroches 
> Cc: Marek Vasut 
> Cc: Martin Blumenstingl 
> Cc: Matthias Brugger 
> Cc: Maxime Coquelin 
> Cc: Maxime Ripard 
> Cc: Melissa Wen 
> Cc: Neil Armstrong 
> Cc: Nicolas Ferre 
> Cc: "Noralf Trønnes" 
> Cc: NXP Linux Team 
> Cc: Oleksandr Andrushchenko 
> Cc: Patrik Jakobsson 
> Cc: Paul Cercueil 
> Cc: Pengutronix Kernel Team 
> Cc: Philippe Cornu 
> Cc: Philipp Zabel 
> Cc: Qiang Yu 
> Cc: Rob Clark 
> Cc: Robert Foss 
> Cc: Rob Herring 
> Cc: Rodrigo Siqueira 
> Cc: Rodrigo Vivi 
> Cc: Roland Scheidegger 
> Cc: Russell King 
> Cc: Sam Ravnborg 
> Cc: Sandy Huang 
> Cc: Sascha Hauer 
> Cc: Sean Paul 
> Cc: Seung-Woo Kim 
> Cc: Shawn Guo 
> Cc: Stefan Agner 
> Cc: Steven Price 
> Cc: Sumit Semwal 
> Cc: Thierry Reding 
> Cc: Tian Tao 
> Cc: Tomeu Vizoso 
> Cc: Tomi Valkeinen 
> Cc: VMware Graphics 
> Cc: Xinliang Liu 
> Cc: Xinwei Kong 
> Cc: Yannick Fertre 
> Cc: Zack Rusin 
> Reviewed-by: Daniel Vetter 
> Signed-off-by: Maxime Ripard 
>
> ---
>
> Changes from v2:
>   - Take into account the feedback from Laurent and Lidiu to no longer
> force generic properties, but prefix vendor-specific properties with
> the vendor name

I'm pretty sure my r-b was without this ... Why exactly do we need
this? KMS is meant to be fairly generic (bugs throw a wrench around
here sometimes, and semantics can be tricky). If we open up the door
to yolo vendor properties in upstream, then that goal is pretty much
written off. And we've been there with vendor properties, it's a
giantic mess.

Minimally drop my r-b, I'm definitely not in support of this idea.

If there's a strong consensus that we really need this then I'm not
going to nack this, but this really needs a pile of acks from
compositor folks that they're willing to live with the resulting
fallout this will likely bring. Your cc list seems to have an absence
of compositor folks, but instead every driver maintainer. That's
backwards. We make uapi for userspace, not for kernel driver
maintainers!

ltdr; I'd go back to v2. And then cc compositor folks on this to get their ack.
-Daniel

> Changes from v1:
>   - Typos and wording reported by Daniel and Alex
> ---
>  Documentation/gpu/drm-kms.rst | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 87e5023e3f55..bbe254dca635 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -463,6 +463,33 @@ KMS Properties
>  This section of the documentation is primarily aimed at user-space 
> developers.
>  For the driver APIs, see the other sections.
>
> +Requirements
> +
> +
> +KMS drivers might need to add extra properties to support new features.
> +Each new property introduced in a driver need to meet a few
> +requirements, in addition to the one mentioned above.:
> +
> +- Before the introduction of any vendor-specific properties, they must
> +  be first checked against the generic ones to avoid any conflict or
> +  redundancy.
> +
> +- Vendor-specific properties must be prefixed by the vendor's name,
> +  following the syntax "$vendor:$property".
> +
> +- Generic properties must be standardized, with some documentation to
> +  describe how the property can be used.
> +
> +- Generic properties must provide a generic helper in the core code to
> +  register that property on the object it attaches to.
> +
> +- Generic properties content must be decoded by the core and provided in
> +  the object's associated state structure. That includes anything
> +  drivers might want to precompute, like :c:type:`struct drm_clip_rect
> +  ` for planes.
> +
> +- An IGT test should be submitted.
> +
>

Re: [PATCH 4/7] dma-buf: Document DMA_BUF_IOCTL_SYNC

2021-06-10 Thread Jason Ekstrand
On Thu, May 27, 2021 at 5:38 AM Daniel Vetter  wrote:
>
> On Tue, May 25, 2021 at 04:17:50PM -0500, Jason Ekstrand wrote:
> > This adds a new "DMA Buffer ioctls" section to the dma-buf docs and adds
> > documentation for DMA_BUF_IOCTL_SYNC.
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Christian König 
> > Cc: Sumit Semwal 
>
> We're still missing the doc for the SET_NAME ioctl, but maybe Sumit can be
> motivated to fix that?
>
> > ---
> >  Documentation/driver-api/dma-buf.rst |  8 +++
> >  include/uapi/linux/dma-buf.h | 32 +++-
> >  2 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/driver-api/dma-buf.rst 
> > b/Documentation/driver-api/dma-buf.rst
> > index 7f37ec30d9fd7..784f84fe50a5e 100644
> > --- a/Documentation/driver-api/dma-buf.rst
> > +++ b/Documentation/driver-api/dma-buf.rst
> > @@ -88,6 +88,9 @@ consider though:
> >  - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ 
> > below for
> >details.
> >
> > +- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
> > +  `DMA Buffer ioctls`_ below for details.
> > +
> >  Basic Operation and Device DMA Access
> >  ~
> >
> > @@ -106,6 +109,11 @@ Implicit Fence Poll Support
> >  .. kernel-doc:: drivers/dma-buf/dma-buf.c
> > :doc: implicit fence polling
> >
> > +DMA Buffer ioctls
> > +~
> > +
> > +.. kernel-doc:: include/uapi/linux/dma-buf.h
> > +
> >  Kernel Functions and Structures Reference
> >  ~
> >
> > diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> > index 7f30393b92c3b..1f67ced853b14 100644
> > --- a/include/uapi/linux/dma-buf.h
> > +++ b/include/uapi/linux/dma-buf.h
> > @@ -22,8 +22,38 @@
> >
> >  #include 
> >
> > -/* begin/end dma-buf functions used for userspace mmap. */
> > +/**
> > + * struct dma_buf_sync - Synchronize with CPU access.
> > + *
> > + * When a DMA buffer is accessed from the CPU via mmap, it is not always
> > + * possible to guarantee coherency between the CPU-visible map and 
> > underlying
> > + * memory.  To manage coherency, DMA_BUF_IOCTL_SYNC must be used to bracket
> > + * any CPU access to give the kernel the chance to shuffle memory around if
> > + * needed.
> > + *
> > + * Prior to accessing the map, the client should call DMA_BUF_IOCTL_SYNC
>
> s/should/must/
>
> > + * with DMA_BUF_SYNC_START and the appropriate read/write flags.  Once the
> > + * access is complete, the client should call DMA_BUF_IOCTL_SYNC with
> > + * DMA_BUF_SYNC_END and the same read/write flags.
>
> I think we should make it really clear here that this is _only_ for cache
> coherency, and that furthermore if you want coherency with gpu access you
> either need to use poll() for implicit sync (link to the relevant section)
> or handle explicit sync with sync_file (again link would be awesome).

I've added such a comment.  I encourage you to look at v2 which I'll
be sending shortly.  I'm not sure how to get the poll() reference to
hyperlink, though.

> > + */
> >  struct dma_buf_sync {
> > + /**
> > +  * @flags: Set of access flags
> > +  *
> > +  * - DMA_BUF_SYNC_START: Indicates the start of a map access
>
> Bikeshed, but I think the item list format instead of bullet point list
> looks neater, e.g.  DOC: standard plane properties in drm_plane.c.

Yeah, that's better.

> > +  *   session.
> > +  *
> > +  * - DMA_BUF_SYNC_END: Indicates the end of a map access session.
> > +  *
> > +  * - DMA_BUF_SYNC_READ: Indicates that the mapped DMA buffer will
> > +  *   be read by the client via the CPU map.
> > +  *
> > +  * - DMA_BUF_SYNC_READ: Indicates that the mapped DMA buffer will
>
> s/READ/WRITE/

Oops.

> > +  *   be written by the client via the CPU map.
> > +  *
> > +  * - DMA_BUF_SYNC_RW: An alias for DMA_BUF_SYNC_READ |
> > +  *   DMA_BUF_SYNC_WRITE.
> > +  */
>
> With the nits addressed: Reviewed-by: Daniel Vetter 

Thanks!

> >   __u64 flags;
> >  };
> >
> > --
> > 2.31.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


[PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-06-10 Thread John . C . Harrison
From: John Harrison 

Various UMDs need to know the L3 bank count. So add a query API for it.

Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
 drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
 drivers/gpu/drm/i915/i915_query.c  | 22 ++
 drivers/gpu/drm/i915/i915_reg.h|  1 +
 include/uapi/drm/i915_drm.h|  1 +
 5 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 2161bf01ef8b..708bb3581d83 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
intel_sseu_dump(&info->sseu, p);
 }
+
+int intel_gt_get_l3bank_count(struct intel_gt *gt)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_wakeref_t wakeref;
+   u32 fuse3;
+
+   if (GRAPHICS_VER(i915) < 12)
+   return -ENODEV;
+
+   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+   fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
+
+   return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 7ec395cace69..46aa1cf4cf30 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
*gt)
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 struct drm_printer *p);
+int intel_gt_get_l3bank_count(struct intel_gt *gt);
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 96bd8fb3e895..0e92bb2d21b2 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -10,6 +10,7 @@
 #include "i915_perf.h"
 #include "i915_query.h"
 #include 
+#include "gt/intel_gt.h"
 
 static int copy_query_item(void *query_hdr, size_t query_sz,
   u32 total_length,
@@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
*i915,
return hwconfig->size;
 }
 
+static int query_l3banks(struct drm_i915_private *i915,
+struct drm_i915_query_item *query_item)
+{
+   u32 banks;
+
+   if (query_item->length == 0)
+   return sizeof(banks);
+
+   if (query_item->length < sizeof(banks))
+   return -EINVAL;
+
+   banks = intel_gt_get_l3bank_count(&i915->gt);
+
+   if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
+&banks, sizeof(banks)))
+   return -EFAULT;
+
+   return sizeof(banks);
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_perf_config,
query_memregion_info,
query_hwconfig_table,
+   query_l3banks,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index eb13c601d680..e9ba88fe3db7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #defineGEN10_MIRROR_FUSE3  _MMIO(0x9118)
 #define GEN10_L3BANK_PAIR_COUNT 4
 #define GEN10_L3BANK_MASK   0x0F
+#define GEN12_GT_L3_MODE_MASK 0xFF
 
 #define GEN8_EU_DISABLE0   _MMIO(0x9134)
 #define   GEN8_EU_DIS0_S0_MASK 0xff
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 87d369cae22a..20d18cca5066 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_PERF_CONFIG  3
 #define DRM_I915_QUERY_MEMORY_REGIONS   4
 #define DRM_I915_QUERY_HWCONFIG_TABLE   5
+#define DRM_I915_QUERY_L3_BANK_COUNT6
 /* Must be kept compact -- no holes and well documented */
 
/**
-- 
2.25.1



[PATCH 2/3] drm/i915/uapi: Add query for hwconfig table

2021-06-10 Thread John . C . Harrison
From: Rodrigo Vivi 

GuC contains a consolidated table with a bunch of information about the
current device.

Previously, this information was spread and hardcoded to all the components
including GuC, i915 and various UMDs. The goal here is to consolidate
the data into GuC in a way that all interested components can grab the
very latest and synchronized information using a simple query.

As per most of the other queries, this one can be called twice.
Once with item.length=0 to determine the exact buffer size, then
allocate the user memory and call it again for to retrieve the
table data. For example:
  struct drm_i915_query_item item = {
.query_id = DRM_I915_QUERY_HWCONCFIG_TABLE;
  };
  query.items_ptr = (int64_t) &item;
  query.num_items = 1;

  ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));

  if (item.length <= 0)
return -ENOENT;

  data = malloc(item.length);
  item.data_ptr = (int64_t) &data;
  ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query));

  // Parse the data as appropriate...

The returned array is a simple and flexible KLV (Key/Length/Value)
formatted table. For example, it could be just:
  enum device_attr {
 ATTR_SOME_VALUE = 0,
 ATTR_SOME_MASK  = 1,
  };

  static const u32 hwconfig[] = {
  ATTR_SOME_VALUE,
  1, // Value Length in DWords
  8, // Value

  ATTR_SOME_MASK,
  3,
  0x00, 0x, 0xFF00,
  };

The attribute ids are defined in a hardware spec. The current list as
known to the i915 driver can be found in i915/gt/intel_guc_hwconfig_types.h

Cc: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Michal Wajdeczko 
Cc: Slawomir Milczarek 
Signed-off-by: Rodrigo Vivi 
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_query.c | 23 +++
 include/uapi/drm/i915_drm.h   |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index e49da36c62fb..96bd8fb3e895 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -480,12 +480,35 @@ static int query_memregion_info(struct drm_i915_private 
*i915,
return total_length;
 }
 
+static int query_hwconfig_table(struct drm_i915_private *i915,
+   struct drm_i915_query_item *query_item)
+{
+   struct intel_gt *gt = &i915->gt;
+   struct intel_guc_hwconfig *hwconfig = >->uc.guc.hwconfig;
+
+   if (!hwconfig->size || !hwconfig->ptr)
+   return -ENODEV;
+
+   if (query_item->length == 0)
+   return hwconfig->size;
+
+   if (query_item->length < hwconfig->size)
+   return -EINVAL;
+
+   if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
+hwconfig->ptr, hwconfig->size))
+   return -EFAULT;
+
+   return hwconfig->size;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
query_engine_info,
query_perf_config,
query_memregion_info,
+   query_hwconfig_table,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c2c7759b7d2e..87d369cae22a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2233,6 +2233,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_ENGINE_INFO 2
 #define DRM_I915_QUERY_PERF_CONFIG  3
 #define DRM_I915_QUERY_MEMORY_REGIONS   4
+#define DRM_I915_QUERY_HWCONFIG_TABLE   5
 /* Must be kept compact -- no holes and well documented */
 
/**
-- 
2.25.1



[PATCH 1/3] drm/i915/guc: Add fetch of hwconfig table

2021-06-10 Thread John . C . Harrison
From: John Harrison 

Implement support for fetching the hardware description table from the
GuC. The call is made twice - once without a destination buffer to
query the size and then a second time to fill in the buffer.

This patch also adds a header file which lists all the attribute values
currently defined for the table. This is included for reference as
these are not currently used by the i915 driver itself.

Note that the table is only available on ADL-P and later platforms.

Cc: Michal Wajdeczko 
Signed-off-by: Rodrigo Vivi 
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 .../gpu/drm/i915/gt/intel_hwconfig_types.h| 102 +++
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|   3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 167 ++
 .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |   6 +
 9 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 2adb6b420c7c..8e957ca7c9f1 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -187,6 +187,7 @@ i915-y += gt/uc/intel_uc.o \
  gt/uc/intel_guc_log.o \
  gt/uc/intel_guc_log_debugfs.o \
  gt/uc/intel_guc_submission.o \
+ gt/uc/intel_guc_hwconfig.o \
  gt/uc/intel_huc.o \
  gt/uc/intel_huc_debugfs.o \
  gt/uc/intel_huc_fw.o
diff --git a/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h 
b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
new file mode 100644
index ..b09c0f65b93a
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef _INTEL_HWCONFIG_TYPES_H_
+#define _INTEL_HWCONFIG_TYPES_H_
+
+/**
+ * enum intel_hwconfig - Global definition of hwconfig table attributes
+ *
+ * Intel devices provide a KLV (Key/Length/Value) table containing
+ * the static hardware configuration for that platform.
+ * This header defines the current attribute keys for this KLV.
+ */
+enum intel_hwconfig {
+   INTEL_HWCONFIG_MAX_SLICES_SUPPORTED = 1,
+   INTEL_HWCONFIG_MAX_DUAL_SUBSLICES_SUPPORTED,/* 2 */
+   INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS,  /* 3 */
+   INTEL_HWCONFIG_NUM_PIXEL_PIPES, /* 4 */
+   INTEL_HWCONFIG_DEPRECATED_MAX_NUM_GEOMETRY_PIPES,   /* 5 */
+   INTEL_HWCONFIG_DEPRECATED_L3_CACHE_SIZE_IN_KB,  /* 6 */
+   INTEL_HWCONFIG_DEPRECATED_L3_BANK_COUNT,/* 7 */
+   INTEL_HWCONFIG_L3_CACHE_WAYS_SIZE_IN_BYTES, /* 8 */
+   INTEL_HWCONFIG_L3_CACHE_WAYS_PER_SECTOR,/* 9 */
+   INTEL_HWCONFIG_MAX_MEMORY_CHANNELS, /* 10 */
+   INTEL_HWCONFIG_MEMORY_TYPE, /* 11 */
+   INTEL_HWCONFIG_CACHE_TYPES, /* 12 */
+   INTEL_HWCONFIG_LOCAL_MEMORY_PAGE_SIZES_SUPPORTED,   /* 13 */
+   INTEL_HWCONFIG_DEPRECATED_SLM_SIZE_IN_KB,   /* 14 */
+   INTEL_HWCONFIG_NUM_THREADS_PER_EU,  /* 15 */
+   INTEL_HWCONFIG_TOTAL_VS_THREADS,/* 16 */
+   INTEL_HWCONFIG_TOTAL_GS_THREADS,/* 17 */
+   INTEL_HWCONFIG_TOTAL_HS_THREADS,/* 18 */
+   INTEL_HWCONFIG_TOTAL_DS_THREADS,/* 19 */
+   INTEL_HWCONFIG_TOTAL_VS_THREADS_POCS,   /* 20 */
+   INTEL_HWCONFIG_TOTAL_PS_THREADS,/* 21 */
+   INTEL_HWCONFIG_DEPRECATED_MAX_FILL_RATE,/* 22 */
+   INTEL_HWCONFIG_MAX_RCS, /* 23 */
+   INTEL_HWCONFIG_MAX_CCS, /* 24 */
+   INTEL_HWCONFIG_MAX_VCS, /* 25 */
+   INTEL_HWCONFIG_MAX_VECS,/* 26 */
+   INTEL_HWCONFIG_MAX_COPY_CS, /* 27 */
+   INTEL_HWCONFIG_DEPRECATED_URB_SIZE_IN_KB,   /* 28 */
+   INTEL_HWCONFIG_MIN_VS_URB_ENTRIES,  /* 29 */
+   INTEL_HWCONFIG_MAX_VS_URB_ENTRIES,  /* 30 */
+   INTEL_HWCONFIG_MIN_PCS_URB_ENTRIES, /* 31 */
+   INTEL_HWCONFIG_MAX_PCS_URB_ENTRIES, /* 32 */
+   INTEL_HWCONFIG_MIN_HS_URB_ENTRIES,  /* 33 */
+   INTEL_HWCONFIG_MAX_HS_URB_ENTRIES,  /* 34 */
+ 

[PATCH 0/3] Add support for querying hw info that UMDs need

2021-06-10 Thread John . C . Harrison
From: John Harrison 

Various UMDs require hardware configuration information about the
current platform. A bunch of static information is available in a
fixed table that can be retrieved from the GuC. Further information
can be calculated dynamically from fuse registers.

Signed-off-by: John Harrison 


John Harrison (2):
  drm/i915/guc: Add fetch of hwconfig table
  drm/i915/uapi: Add query for L3 bank count

Rodrigo Vivi (1):
  drm/i915/uapi: Add query for hwconfig table

 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/gt/intel_gt.c|  15 ++
 drivers/gpu/drm/i915/gt/intel_gt.h|   1 +
 .../gpu/drm/i915/gt/intel_hwconfig_types.h| 102 +++
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |   4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|   3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c   | 167 ++
 .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h   |  19 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |   6 +
 drivers/gpu/drm/i915/i915_query.c |  45 +
 drivers/gpu/drm/i915/i915_reg.h   |   1 +
 include/uapi/drm/i915_drm.h   |   2 +
 14 files changed, 368 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_hwconfig_types.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h

-- 
2.25.1



Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 10:10 PM Jason Ekstrand  wrote:
>
> On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand  wrote:
> > On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter  
> > wrote:
> > > On Thu, Jun 10, 2021 at 11:39 AM Christian König
> > >  wrote:
> > > > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
> > > > > On 09/06/2021 22:29, Jason Ekstrand wrote:
> > > > >>
> > > > >> We've tried to keep it somewhat contained by doing most of the hard 
> > > > >> work
> > > > >> to prevent access of recycled objects via dma_fence_get_rcu_safe().
> > > > >> However, a quick grep of kernel sources says that, of the 30 
> > > > >> instances
> > > > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
> > > > >> It's likely there bear traps in DRM and related subsystems just 
> > > > >> waiting
> > > > >> for someone to accidentally step in them.
> > > > >
> > > > > ...because dma_fence_get_rcu_safe apears to be about whether the
> > > > > *pointer* to the fence itself is rcu protected, not about the fence
> > > > > object itself.
> > > >
> > > > Yes, exactly that.
> >
> > The fact that both of you think this either means that I've completely
> > missed what's going on with RCUs here (possible but, in this case, I
> > think unlikely) or RCUs on dma fences should scare us all.
>
> Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
> such,  I'd like to ask a slightly different question:  What are the
> rules about what is allowed to be done under the RCU read lock and
> what guarantees does a driver need to provide?
>
> I think so far that we've all agreed on the following:
>
>  1. Freeing an unsignaled fence is ok as long as it doesn't have any
> pending callbacks.  (Callbacks should hold a reference anyway).
>
>  2. The pointer race solved by dma_fence_get_rcu_safe is real and
> requires the loop to sort out.
>
> But let's say I have a dma_fence pointer that I got from, say, calling
> dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
> with it under the RCU lock?  What assumptions can I make?  Is this
> code, for instance, ok?
>
> rcu_read_lock();
> fence = dma_resv_excl_fence(obj);
> idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> rcu_read_unlock();
>
> This code very much looks correct under the following assumptions:
>
>  1. A valid fence pointer stays alive under the RCU read lock
>  2. SIGNALED_BIT is set-once (it's never unset after being set).
>
> However, if it were, we wouldn't have dma_resv_test_singnaled(), now
> would we? :-)
>
> The moment you introduce ANY dma_fence recycling that recycles a
> dma_fence within a single RCU grace period, all your assumptions break
> down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
> also have a little i915_request recycler to try and help with memory
> pressure scenarios in certain critical sections that also doesn't
> respect RCU grace periods.  And, as mentioned multiple times, our
> recycling leaks into every other driver because, thanks to i915's
> choice, the above 4-line code snippet isn't valid ANYWHERE in the
> kernel.
>
> So the question I'm raising isn't so much about the rules today.
> Today, we live in the wild wild west where everything is YOLO.  But
> where do we want to go?  Do we like this wild west world?  So we want
> more consistency under the RCU read lock?  If so, what do we want the
> rules to be?
>
> One option would be to accept the wild-west world we live in and say
> "The RCU read lock gains you nothing.  If you want to touch the guts
> of a dma_fence, take a reference".  But, at that point, we're eating
> two atomics for every time someone wants to look at a dma_fence.  Do
> we want that?
>
> Alternatively, and this what I think Daniel and I were trying to
> propose here, is that we place some constraints on dma_fence
> recycling.  Specifically that, under the RCU read lock, the fence
> doesn't suddenly become a new fence.  All of the immutability and
> once-mutability guarantees of various bits of dma_fence hold as long
> as you have the RCU read lock.

Yeah this is suboptimal. Too many potential bugs, not enough benefits.

This entire __rcu business started so that there would be a lockless
way to get at fences, or at least the exclusive one. That did not
really pan out. I think we have a few options:

- drop the idea of rcu/lockless dma-fence access outright. A quick
sequence of grabbing the lock, acquiring the dma_fence and then
dropping your lock again is probably plenty good. There's a lot of
call_rcu and other stuff we could probably delete. I have no idea what
the perf impact across all the drivers would be.

- try to make all drivers follow some stricter rules. The trouble is
that at least with radeon dma_fence callbacks aren't even very
reliable (that's why it has its own dma_fence_wait implementation), so
things are wobbly anyway.

- live with the current situation, but radically delete all unsafe
interfaces. I.e. nothing is allowed to dir

[PATCH v5 1/1] drm/doc: document drm_mode_get_plane

2021-06-10 Thread Leandro Ribeiro
Add a small description and document struct fields of
drm_mode_get_plane.

Signed-off-by: Leandro Ribeiro 
---
 include/uapi/drm/drm_mode.h | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 9b6722d45f36..698559d9336b 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -312,16 +312,51 @@ struct drm_mode_set_plane {
__u32 src_w;
 };

+/**
+ * struct drm_mode_get_plane - Get plane metadata.
+ *
+ * Userspace can perform a GETPLANE ioctl to retrieve information about a
+ * plane.
+ *
+ * To retrieve the number of formats supported, set @count_format_types to zero
+ * and call the ioctl. @count_format_types will be updated with the value.
+ *
+ * To retrieve these formats, allocate an array with the memory needed to store
+ * @count_format_types formats. Point @format_type_ptr to this array and call
+ * the ioctl again (with @count_format_types still set to the value returned in
+ * the first ioctl call).
+ */
 struct drm_mode_get_plane {
+   /**
+* @plane_id: Object ID of the plane whose information should be
+* retrieved. Set by caller.
+*/
__u32 plane_id;

+   /** @crtc_id: Object ID of the current CRTC. */
__u32 crtc_id;
+   /** @fb_id: Object ID of the current fb. */
__u32 fb_id;

+   /**
+* @possible_crtcs: Bitmask of CRTC's compatible with the plane. CRTC's
+* are created and they receive an index, which corresponds to their
+* position in the bitmask. Bit N corresponds to
+* :ref:`CRTC index` N.
+*/
__u32 possible_crtcs;
+   /**
+* @gamma_size: Number of entries of the legacy gamma lookup table.
+* Deprecated.
+*/
__u32 gamma_size;

+   /** @count_format_types: Number of formats. */
__u32 count_format_types;
+   /**
+* @format_type_ptr: Pointer to ``__u32`` array of formats that are
+* supported by the plane. These formats do not require modifiers.
+*/
__u64 format_type_ptr;
 };

--
2.31.1



[PATCH v5 0/1] Document drm_mode_get_plane

2021-06-10 Thread Leandro Ribeiro
v2: possible_crtcs field is a bitmask, not a pointer. Suggested by
Ville Syrjälä 

v3: document how userspace should find out CRTC index. Also,
document that field 'gamma_size' represents the number of
entries in the lookup table. Suggested by Pekka Paalanen
 and Daniel Vetter 

v4: document IN and OUT fields and make the description more
concise. Suggested by Pekka Paalanen 

v5: CRTC index patch already merged, only patch to document drm_mode_get_plane
now. Added that gamma LUT size is deprecated and dropped incorrect text
documenting that plane number of formats may change from one ioctl to the
other. Suggested by Ville Syrjälä 

Leandro Ribeiro (1):
  drm/doc: document drm_mode_get_plane

 include/uapi/drm/drm_mode.h | 35 +++
 1 file changed, 35 insertions(+)

--
2.31.1



Re: [PATCH 0/7] dma-buf: Add an API for exporting sync files (v11)

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 3:11 PM Chia-I Wu  wrote:
>
> On Tue, May 25, 2021 at 2:18 PM Jason Ekstrand  wrote:
> > Modern userspace APIs like Vulkan are built on an explicit
> > synchronization model.  This doesn't always play nicely with the
> > implicit synchronization used in the kernel and assumed by X11 and
> > Wayland.  The client -> compositor half of the synchronization isn't too
> > bad, at least on intel, because we can control whether or not i915
> > synchronizes on the buffer and whether or not it's considered written.
> We might have an important use case for this half, for virtio-gpu and Chrome 
> OS.
>
> When the guest compositor acts as a proxy to connect guest apps to the
> host compositor, implicit fencing requires the guest compositor to do
> a wait before forwarding the buffer to the host compositor.  With this
> patch, the guest compositor can extract the dma-fence from the buffer,
> and if the fence is a virtio-gpu fence, forward both the fence and the
> buffer to the host compositor.  It will allow us to convert a
> guest-side wait into a host-side wait.

Yeah, I think the first half solves a lot of problems.  I'm rebasing
it now and will send a v12 series shortly.  I don't think there's a
lot standing between the first few patches and merging.  I've got IGT
tests and I'm pretty sure the code is good.  The last review cycle got
distracted with some renaming fun.

--Jason


Re: [PATCH 0/7] dma-buf: Add an API for exporting sync files (v11)

2021-06-10 Thread Chia-I Wu
On Tue, May 25, 2021 at 2:18 PM Jason Ekstrand  wrote:
> Modern userspace APIs like Vulkan are built on an explicit
> synchronization model.  This doesn't always play nicely with the
> implicit synchronization used in the kernel and assumed by X11 and
> Wayland.  The client -> compositor half of the synchronization isn't too
> bad, at least on intel, because we can control whether or not i915
> synchronizes on the buffer and whether or not it's considered written.
We might have an important use case for this half, for virtio-gpu and Chrome OS.

When the guest compositor acts as a proxy to connect guest apps to the
host compositor, implicit fencing requires the guest compositor to do
a wait before forwarding the buffer to the host compositor.  With this
patch, the guest compositor can extract the dma-fence from the buffer,
and if the fence is a virtio-gpu fence, forward both the fence and the
buffer to the host compositor.  It will allow us to convert a
guest-side wait into a host-side wait.


Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand  wrote:
> On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter  wrote:
> > On Thu, Jun 10, 2021 at 11:39 AM Christian König
> >  wrote:
> > > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
> > > > On 09/06/2021 22:29, Jason Ekstrand wrote:
> > > >>
> > > >> We've tried to keep it somewhat contained by doing most of the hard 
> > > >> work
> > > >> to prevent access of recycled objects via dma_fence_get_rcu_safe().
> > > >> However, a quick grep of kernel sources says that, of the 30 instances
> > > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
> > > >> It's likely there bear traps in DRM and related subsystems just waiting
> > > >> for someone to accidentally step in them.
> > > >
> > > > ...because dma_fence_get_rcu_safe apears to be about whether the
> > > > *pointer* to the fence itself is rcu protected, not about the fence
> > > > object itself.
> > >
> > > Yes, exactly that.
>
> The fact that both of you think this either means that I've completely
> missed what's going on with RCUs here (possible but, in this case, I
> think unlikely) or RCUs on dma fences should scare us all.

Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
such,  I'd like to ask a slightly different question:  What are the
rules about what is allowed to be done under the RCU read lock and
what guarantees does a driver need to provide?

I think so far that we've all agreed on the following:

 1. Freeing an unsignaled fence is ok as long as it doesn't have any
pending callbacks.  (Callbacks should hold a reference anyway).

 2. The pointer race solved by dma_fence_get_rcu_safe is real and
requires the loop to sort out.

But let's say I have a dma_fence pointer that I got from, say, calling
dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
with it under the RCU lock?  What assumptions can I make?  Is this
code, for instance, ok?

rcu_read_lock();
fence = dma_resv_excl_fence(obj);
idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
rcu_read_unlock();

This code very much looks correct under the following assumptions:

 1. A valid fence pointer stays alive under the RCU read lock
 2. SIGNALED_BIT is set-once (it's never unset after being set).

However, if it were, we wouldn't have dma_resv_test_singnaled(), now
would we? :-)

The moment you introduce ANY dma_fence recycling that recycles a
dma_fence within a single RCU grace period, all your assumptions break
down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
also have a little i915_request recycler to try and help with memory
pressure scenarios in certain critical sections that also doesn't
respect RCU grace periods.  And, as mentioned multiple times, our
recycling leaks into every other driver because, thanks to i915's
choice, the above 4-line code snippet isn't valid ANYWHERE in the
kernel.

So the question I'm raising isn't so much about the rules today.
Today, we live in the wild wild west where everything is YOLO.  But
where do we want to go?  Do we like this wild west world?  So we want
more consistency under the RCU read lock?  If so, what do we want the
rules to be?

One option would be to accept the wild-west world we live in and say
"The RCU read lock gains you nothing.  If you want to touch the guts
of a dma_fence, take a reference".  But, at that point, we're eating
two atomics for every time someone wants to look at a dma_fence.  Do
we want that?

Alternatively, and this what I think Daniel and I were trying to
propose here, is that we place some constraints on dma_fence
recycling.  Specifically that, under the RCU read lock, the fence
doesn't suddenly become a new fence.  All of the immutability and
once-mutability guarantees of various bits of dma_fence hold as long
as you have the RCU read lock.

--Jason


[Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered

2021-06-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #8 from Lahfa Samy (s...@lahfa.xyz) ---
In the meantime, I'll be trying to find a way to reproduce this issue reliably,
if you have any plans on writing a patch for this issue, I would be glad to
help in any testing in order to help squash this bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Rodrigo Vivi
On Thu, Jun 10, 2021 at 07:47:31PM +0200, Maxime Ripard wrote:
> New KMS properties come with a bunch of requirements to avoid each
> driver from running their own, inconsistent, set of properties,
> eventually leading to issues like property conflicts, inconsistencies
> between drivers and semantics, etc.
> 
> Let's document what we expect.
> 
> Cc: Alexandre Belloni 
> Cc: Alexandre Torgue 
> Cc: Alex Deucher 
> Cc: Alison Wang 
> Cc: Alyssa Rosenzweig 
> Cc: Andrew Jeffery 
> Cc: Andrzej Hajda 
> Cc: Anitha Chrisanthus 
> Cc: Benjamin Gaignard 
> Cc: Ben Skeggs 
> Cc: Boris Brezillon 
> Cc: Brian Starkey 
> Cc: Chen Feng 
> Cc: Chen-Yu Tsai 
> Cc: Christian Gmeiner 
> Cc: "Christian König" 
> Cc: Chun-Kuang Hu 
> Cc: Edmund Dea 
> Cc: Eric Anholt 
> Cc: Fabio Estevam 
> Cc: Gerd Hoffmann 
> Cc: Haneen Mohammed 
> Cc: Hans de Goede 
> Cc: "Heiko Stübner" 
> Cc: Huang Rui 
> Cc: Hyun Kwon 
> Cc: Inki Dae 
> Cc: Jani Nikula 
> Cc: Jernej Skrabec 
> Cc: Jerome Brunet 
> Cc: Joel Stanley 
> Cc: John Stultz 
> Cc: Jonas Karlman 
> Cc: Jonathan Hunter 
> Cc: Joonas Lahtinen 
> Cc: Joonyoung Shim 
> Cc: Jyri Sarha 
> Cc: Kevin Hilman 
> Cc: Kieran Bingham 
> Cc: Krzysztof Kozlowski 
> Cc: Kyungmin Park 
> Cc: Laurent Pinchart 
> Cc: Linus Walleij 
> Cc: Liviu Dudau 
> Cc: Lucas Stach 
> Cc: Ludovic Desroches 
> Cc: Marek Vasut 
> Cc: Martin Blumenstingl 
> Cc: Matthias Brugger 
> Cc: Maxime Coquelin 
> Cc: Maxime Ripard 
> Cc: Melissa Wen 
> Cc: Neil Armstrong 
> Cc: Nicolas Ferre 
> Cc: "Noralf Trønnes" 
> Cc: NXP Linux Team 
> Cc: Oleksandr Andrushchenko 
> Cc: Patrik Jakobsson 
> Cc: Paul Cercueil 
> Cc: Pengutronix Kernel Team 
> Cc: Philippe Cornu 
> Cc: Philipp Zabel 
> Cc: Qiang Yu 
> Cc: Rob Clark 
> Cc: Robert Foss 
> Cc: Rob Herring 
> Cc: Rodrigo Siqueira 
> Cc: Rodrigo Vivi 
> Cc: Roland Scheidegger 
> Cc: Russell King 
> Cc: Sam Ravnborg 
> Cc: Sandy Huang 
> Cc: Sascha Hauer 
> Cc: Sean Paul 
> Cc: Seung-Woo Kim 
> Cc: Shawn Guo 
> Cc: Stefan Agner 
> Cc: Steven Price 
> Cc: Sumit Semwal 
> Cc: Thierry Reding 
> Cc: Tian Tao 
> Cc: Tomeu Vizoso 
> Cc: Tomi Valkeinen 
> Cc: VMware Graphics 
> Cc: Xinliang Liu 
> Cc: Xinwei Kong 
> Cc: Yannick Fertre 
> Cc: Zack Rusin 
> Reviewed-by: Daniel Vetter 
> Signed-off-by: Maxime Ripard 
> 
> ---
> 
> Changes from v2:
>   - Take into account the feedback from Laurent and Lidiu to no longer
> force generic properties, but prefix vendor-specific properties with
> the vendor name
> 
> Changes from v1:
>   - Typos and wording reported by Daniel and Alex
> ---
>  Documentation/gpu/drm-kms.rst | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 87e5023e3f55..bbe254dca635 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -463,6 +463,33 @@ KMS Properties
>  This section of the documentation is primarily aimed at user-space 
> developers.
>  For the driver APIs, see the other sections.
>  
> +Requirements
> +
> +
> +KMS drivers might need to add extra properties to support new features.
> +Each new property introduced in a driver need to meet a few
> +requirements, in addition to the one mentioned above.:
> +
> +- Before the introduction of any vendor-specific properties, they must
> +  be first checked against the generic ones to avoid any conflict or
> +  redundancy.
> +
> +- Vendor-specific properties must be prefixed by the vendor's name,
> +  following the syntax "$vendor:$property".
> +
> +- Generic properties must be standardized, with some documentation to
> +  describe how the property can be used.
> +
> +- Generic properties must provide a generic helper in the core code to
> +  register that property on the object it attaches to.
> +
> +- Generic properties content must be decoded by the core and provided in
> +  the object's associated state structure. That includes anything
> +  drivers might want to precompute, like :c:type:`struct drm_clip_rect
> +  ` for planes.
> +
> +- An IGT test should be submitted.
> +
>  Property Types and Blob Property Support
>  

Acked-by: Rodrigo Vivi 

>  
> -- 
> 2.31.1
> 


Re: [PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Liviu Dudau
Hi Maxime,

On Thu, Jun 10, 2021 at 07:47:31PM +0200, Maxime Ripard wrote:
> New KMS properties come with a bunch of requirements to avoid each
> driver from running their own, inconsistent, set of properties,
> eventually leading to issues like property conflicts, inconsistencies
> between drivers and semantics, etc.
> 
> Let's document what we expect.
> 
> Cc: Alexandre Belloni 
> Cc: Alexandre Torgue 
> Cc: Alex Deucher 
> Cc: Alison Wang 
> Cc: Alyssa Rosenzweig 
> Cc: Andrew Jeffery 
> Cc: Andrzej Hajda 
> Cc: Anitha Chrisanthus 
> Cc: Benjamin Gaignard 
> Cc: Ben Skeggs 
> Cc: Boris Brezillon 
> Cc: Brian Starkey 
> Cc: Chen Feng 
> Cc: Chen-Yu Tsai 
> Cc: Christian Gmeiner 
> Cc: "Christian König" 
> Cc: Chun-Kuang Hu 
> Cc: Edmund Dea 
> Cc: Eric Anholt 
> Cc: Fabio Estevam 
> Cc: Gerd Hoffmann 
> Cc: Haneen Mohammed 
> Cc: Hans de Goede 
> Cc: "Heiko Stübner" 
> Cc: Huang Rui 
> Cc: Hyun Kwon 
> Cc: Inki Dae 
> Cc: Jani Nikula 
> Cc: Jernej Skrabec 
> Cc: Jerome Brunet 
> Cc: Joel Stanley 
> Cc: John Stultz 
> Cc: Jonas Karlman 
> Cc: Jonathan Hunter 
> Cc: Joonas Lahtinen 
> Cc: Joonyoung Shim 
> Cc: Jyri Sarha 
> Cc: Kevin Hilman 
> Cc: Kieran Bingham 
> Cc: Krzysztof Kozlowski 
> Cc: Kyungmin Park 
> Cc: Laurent Pinchart 
> Cc: Linus Walleij 
> Cc: Liviu Dudau 
> Cc: Lucas Stach 
> Cc: Ludovic Desroches 
> Cc: Marek Vasut 
> Cc: Martin Blumenstingl 
> Cc: Matthias Brugger 
> Cc: Maxime Coquelin 
> Cc: Maxime Ripard 
> Cc: Melissa Wen 
> Cc: Neil Armstrong 
> Cc: Nicolas Ferre 
> Cc: "Noralf Trønnes" 
> Cc: NXP Linux Team 
> Cc: Oleksandr Andrushchenko 
> Cc: Patrik Jakobsson 
> Cc: Paul Cercueil 
> Cc: Pengutronix Kernel Team 
> Cc: Philippe Cornu 
> Cc: Philipp Zabel 
> Cc: Qiang Yu 
> Cc: Rob Clark 
> Cc: Robert Foss 
> Cc: Rob Herring 
> Cc: Rodrigo Siqueira 
> Cc: Rodrigo Vivi 
> Cc: Roland Scheidegger 
> Cc: Russell King 
> Cc: Sam Ravnborg 
> Cc: Sandy Huang 
> Cc: Sascha Hauer 
> Cc: Sean Paul 
> Cc: Seung-Woo Kim 
> Cc: Shawn Guo 
> Cc: Stefan Agner 
> Cc: Steven Price 
> Cc: Sumit Semwal 
> Cc: Thierry Reding 
> Cc: Tian Tao 
> Cc: Tomeu Vizoso 
> Cc: Tomi Valkeinen 
> Cc: VMware Graphics 
> Cc: Xinliang Liu 
> Cc: Xinwei Kong 
> Cc: Yannick Fertre 
> Cc: Zack Rusin 
> Reviewed-by: Daniel Vetter 
> Signed-off-by: Maxime Ripard 
> 
> ---
> 
> Changes from v2:
>   - Take into account the feedback from Laurent and Lidiu to no longer
> force generic properties, but prefix vendor-specific properties with
> the vendor name
> 
> Changes from v1:
>   - Typos and wording reported by Daniel and Alex
> ---
>  Documentation/gpu/drm-kms.rst | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 87e5023e3f55..bbe254dca635 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -463,6 +463,33 @@ KMS Properties
>  This section of the documentation is primarily aimed at user-space 
> developers.
>  For the driver APIs, see the other sections.
>  
> +Requirements
> +
> +
> +KMS drivers might need to add extra properties to support new features.
> +Each new property introduced in a driver need to meet a few
> +requirements, in addition to the one mentioned above.:
> +
> +- Before the introduction of any vendor-specific properties, they must
> +  be first checked against the generic ones to avoid any conflict or
> +  redundancy.
> +
> +- Vendor-specific properties must be prefixed by the vendor's name,
> +  following the syntax "$vendor:$property".
> +
> +- Generic properties must be standardized, with some documentation to
> +  describe how the property can be used.
> +
> +- Generic properties must provide a generic helper in the core code to
> +  register that property on the object it attaches to.
> +
> +- Generic properties content must be decoded by the core and provided in
> +  the object's associated state structure. That includes anything
> +  drivers might want to precompute, like :c:type:`struct drm_clip_rect
> +  ` for planes.
> +
> +- An IGT test should be submitted.
> +
>  Property Types and Blob Property Support
>  

Looks nice, thanks for the work!

Reviewed-by: Liviu Dudau 

Best regards,
Liviu

>  
> -- 
> 2.31.1
> 

-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯


Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Christian König




Am 10.06.21 um 19:11 schrieb Daniel Vetter:

On Thu, Jun 10, 2021 at 06:54:13PM +0200, Christian König wrote:

Am 10.06.21 um 18:37 schrieb Daniel Vetter:

On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  wrote:

On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 1:51 AM Christian König
 wrote:

Am 09.06.21 um 23:29 schrieb Jason Ekstrand:

This helper existed to handle the weird corner-cases caused by using
SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is using
that anymore (i915 was the only real user), dma_fence_get_rcu is
sufficient.  The one slightly annoying thing we have to deal with here
is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call site
ends up being 3 lines instead of 1.

That's an outright NAK.

The loop in dma_fence_get_rcu_safe is necessary because the underlying
fence object can be replaced while taking the reference.

Right.  I had missed a bit of that when I first read through it.  I
see the need for the loop now.  But there are some other tricky bits
in there besides just the loop.

I thought that's what the kref_get_unless_zero was for in
dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
dma_fence_get_rcu around, since that should either be a kref_get or
it's just unsafe to call it ...

AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
your fence and it's never recycled.

Where the loop comes in is if you have someone come along, under the
RCU write lock or not, and swap out the pointer and unref it while
you're trying to fetch it.  In this case, if you just write the three
lines I duplicated throughout this patch, you'll end up with NULL if
you (partially) lose the race.  The loop exists to ensure that you get
either the old pointer or the new pointer and you only ever get NULL
if somewhere during the mess, the pointer actually gets set to NULL.

It's not that easy. At least not for dma_resv.

The thing is, you can't just go in and replace the write fence with
something else. There's supposed to be some ordering here (how much we
actually still follow that or not is a bit another question, that I'm
trying to answer with an audit of lots of drivers), which means if you
replace e.g. the exclusive fence, the previous fence will _not_ just
get freed. Because the next exclusive fence needs to wait for that to
finish first.

Conceptually the refcount will _only_ go to 0 once all later
dependencies have seen it get signalled, and once the fence itself has
been signalled.

I think that's the point where it breaks.

See IIRC radeon for example doesn't keep unsignaled fences around when
nobody is interested in them. And I think noveau does it that way as well.

So for example you can have the following
1. Submission to 3D ring, this creates fence A.
2. Fence A is put as en exclusive fence in a dma_resv object.
3. Submission to 3D ring, this creates fence B.
4. Fence B is replacing fence A as the exclusive fence in the dma_resv
object.

Fence A is replaced and therefore destroyed while it is not even close to be
signaled. But the replacement is perfectly ok, since fence B is submitted to
the same ring.

When somebody would use dma_fence_get_rcu on the exclusive fence and get
NULL it would fail to wait for the submissions. You don't really need the
SLAB_TYPESAFE_BY_RCU for this to blow up in your face.

Uh that's wild ...

I thought that's impossible, but in dma_fence_release() we only complain
if there's both waiters and the fence isn't signalled yet. I had no idea.


We could change that rule of curse, amdgpu for example is always keeping
fences around until they are signaled. But IIRC that's how it was for radeon
like forever.

Yeah I think we could, but then we need to do a few things:
- document that defactor only get_rcu_safe is ok to use
- delete get_rcu, it's not really a safe thing to do anywhere


Well I would rename dma_fence_get_rcu into dma_fence_get_unless_zero.

And then we can dma_fence_get_rcu_safe() into dma_fence_get_rcu().

Christian.



-Daniel


Regards,
Christian.


   A signalled fence might as well not exist, so if
that's what  happened in that tiny window, then yes a legal scenario
is the following:

thread A:
- rcu_dereference(resv->exclusive_fence);

thread B:
- dma_fence signals, retires, drops refcount to 0
- sets the exclusive fence to NULL
- creates a new dma_fence
- sets the exclusive fence to that new fence

thread A:
- kref_get_unless_zero fails, we report that the exclusive fence slot is NULL

Ofc normally we're fully pipeline, and we lazily clear slots, so no
one ever writes the fence ptr to NULL. But conceptually it's totally
fine, and an indistinguishable sequence of events from the point of
view of thread A.

Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
big time. The only reason you need _unsafe is if you have

Re: [PATCH V2 1/2] dt-bindings: display: bridge: lvds-codec: Document LVDS data mapping select

2021-06-10 Thread Rob Herring
On Wed, 02 Jun 2021 22:37:30 +0200, Marek Vasut wrote:
> Decoder input LVDS format is a property of the decoder chip or even
> its strapping. Add DT property data-mapping the same way lvds-panel
> does, to define the LVDS data mapping.
> 
> Signed-off-by: Marek Vasut 
> Cc: Laurent Pinchart 
> Cc: Rob Herring 
> Cc: Sam Ravnborg 
> Cc: devicet...@vger.kernel.org
> To: dri-devel@lists.freedesktop.org
> ---
> V2: - Use allOf
> - Move the data-mapping to endpoint
> ---
>  .../bindings/display/bridge/lvds-codec.yaml   | 53 ++-
>  1 file changed, 41 insertions(+), 12 deletions(-)
> 

Reviewed-by: Rob Herring 


Re: [PATCH v10 07/10] mm: Device exclusive memory access

2021-06-10 Thread Peter Xu
On Thu, Jun 10, 2021 at 10:18:25AM +1000, Alistair Popple wrote:
> > > The main problem is split_huge_pmd_address() unconditionally calls a mmu
> > > notifier so I would need to plumb in passing an owner everywhere which 
> > > could
> > > get messy.
> > 
> > Could I ask why?  split_huge_pmd_address() will notify with CLEAR, so I'm a 
> > bit
> > confused why we need to pass over the owner.
> 
> Sure, it is the same reason we need to pass it for the exclusive notifier.
> Any invalidation during the make exclusive operation will break the mmu read
> side critical section forcing a retry of the operation. The owner field is 
> what
> is used to filter out invalidations (such as the exclusive invalidation) that
> don't need to be retried.

Do you mean the mmu_interval_read_begin|retry() calls?

Hmm, the thing is.. to me FOLL_SPLIT_PMD should have similar effect to explicit
call split_huge_pmd_address(), afaict.  Since both of them use 
__split_huge_pmd()
internally which will generate that unwanted CLEAR notify.

If that's the case, I think it fails because split_huge_pmd_address() will
trigger that CLEAR notify unconditionally (even if it's not a thp; not sure
whether it should be optimized to not notify at all... definitely another
story), while FOLL_SPLIT_PMD will skip the notify as it calls split_huge_pmd()
instead, who checks the pmd before calling __split_huge_pmd().

Does it also mean that if there's a real THP it won't really work?  As then
FOLL_SPLIT_PMD will start to trigger that CLEAR notify too, I think..

-- 
Peter Xu



Re: nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer dereference in nouveau_bo_sync_for_device

2021-06-10 Thread Christian König

Am 10.06.21 um 19:50 schrieb Ondrej Zary:

On Thursday 10 June 2021 08:43:06 Christian König wrote:

Am 09.06.21 um 22:00 schrieb Ondrej Zary:

On Wednesday 09 June 2021 11:21:05 Christian König wrote:

Am 09.06.21 um 09:10 schrieb Ondrej Zary:

On Wednesday 09 June 2021, Christian König wrote:

Am 09.06.21 um 08:57 schrieb Ondrej Zary:

[SNIP]

Thanks for the heads up. So the problem with my patch is already fixed,
isn't it?

The NULL pointer dereference in nouveau_bo_wr16 introduced in
141b15e59175aa174ca1f7596188bd15a7ca17ba was fixed by
aea656b0d05ec5b8ed5beb2f94c4dd42ea834e9d.

That's the bug I hit when bisecting the original problem:
NULL pointer dereference in nouveau_bo_sync_for_device
It's caused by:
# first bad commit: [e34b8feeaa4b65725b25f49c9b08a0f8707e8e86] drm/ttm: merge 
ttm_dma_tt back into ttm_tt

Good that I've asked :)

Ok that's a bit strange. e34b8feeaa4b65725b25f49c9b08a0f8707e8e86 was
created mostly automated.

Do you have the original backtrace of that NULL pointer deref once more?

The original backtrace is here: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2021%2F6%2F5%2F350&data=04%7C01%7Cchristian.koenig%40amd.com%7C657222345e3242e7a6a608d92c383f66%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637589442963348551%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZkJs%2FR8MeQKUxwhJUC%2FG4Hi3T%2FMIftt%2FWRh%2B1%2BU5rUE%3D&reserved=0

And the problem is that ttm_dma->dma_address is NULL, right? Mhm, I
don't see how that can happen since nouveau is using ttm_sg_tt_init().

Apart from that what nouveau does here is rather questionable since you
need a coherent architecture for most things anyway, but that's not what
we are trying to fix here.

Can you try to narrow down if ttm_sg_tt_init is called before calling
this function for the tt object in question?

ttm_sg_tt_init is not called:
[   12.150124] nouveau :01:00.0: DRM: VRAM: 31 MiB
[   12.150133] nouveau :01:00.0: DRM: GART: 128 MiB
[   12.150143] nouveau :01:00.0: DRM: BMP version 5.6
[   12.150151] nouveau :01:00.0: DRM: No DCB data found in VBIOS
[   12.151362] ttm_tt_init
[   12.151370] ttm_tt_init_fields
[   12.151374] ttm_tt_alloc_page_directory
[   12.151615] BUG: kernel NULL pointer dereference, address: 

Please add dump_stack(); to ttm_tt_init() and report back with the
backtrace.

I can't see how this is called from the nouveau code, only possibility I
see is that it is maybe called through the AGP code somehow.

Yes, you're right:
[   13.192663] Call Trace:
[   13.192678]  dump_stack+0x54/0x68
[   13.192690]  ttm_tt_init+0x11/0x8a [ttm]
[   13.192699]  ttm_agp_tt_create+0x39/0x51 [ttm]
[   13.192840]  nouveau_ttm_tt_create+0x17/0x22 [nouveau]
[   13.192856]  ttm_tt_create+0x78/0x8c [ttm]
[   13.192864]  ttm_bo_handle_move_mem+0x7d/0xca [ttm]
[   13.192873]  ttm_bo_validate+0x92/0xc8 [ttm]
[   13.192883]  ttm_bo_init_reserved+0x216/0x243 [ttm]
[   13.192892]  ttm_bo_init+0x45/0x65 [ttm]
[   13.193018]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
[   13.193150]  nouveau_bo_init+0x8c/0x94 [nouveau]
[   13.193273]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
[   13.193407]  nouveau_bo_new+0x44/0x57 [nouveau]
[   13.193537]  nouveau_channel_prep+0xa3/0x269 [nouveau]
[   13.193665]  nouveau_channel_new+0x3c/0x5f7 [nouveau]
[   13.193679]  ? slab_free_freelist_hook+0x3b/0xa7
[   13.193686]  ? kfree+0x9e/0x11a
[   13.193781]  ? nvif_object_sclass_put+0xd/0x16 [nouveau]
[   13.193908]  nouveau_drm_device_init+0x2e2/0x646 [nouveau]
[   13.193924]  ? pci_enable_device_flags+0x1e/0xac
[   13.194052]  nouveau_drm_probe+0xeb/0x188 [nouveau]
[   13.194182]  ? nouveau_drm_device_init+0x646/0x646 [nouveau]
[   13.194195]  pci_device_probe+0x89/0xe9
[   13.194205]  really_probe+0x127/0x2a7
[   13.194212]  driver_probe_device+0x5b/0x87
[   13.194219]  device_driver_attach+0x2e/0x41
[   13.194226]  __driver_attach+0x7c/0x83
[   13.194232]  bus_for_each_dev+0x4c/0x66
[   13.194238]  driver_attach+0x14/0x16
[   13.194244]  ? device_driver_attach+0x41/0x41
[   13.194251]  bus_add_driver+0xc5/0x16c
[   13.194258]  driver_register+0x87/0xb9
[   13.194265]  __pci_register_driver+0x38/0x3b
[   13.194271]  ? 0xf0c0d000
[   13.194362]  nouveau_drm_init+0x14c/0x1000 [nouveau]

How is ttm_dma_tt->dma_address allocated?


Mhm, I need to double check how AGP is supposed to work.

Since barely anybody is using it these days it is something which breaks 
from time to time.


Thanks for the backtrace,
Christian.


  I cannot find any assignment
executed (in the working code):

$ git grep dma_address\ = drivers/gpu/
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:   sg->sgl->dma_address = 
addr;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:dma_address = 
&dma->dma_address[offset >> PAGE_SHIFT];
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:dma_address = (mm_node->start 
<< PAGE_SHIFT) + offset;
d

Re: nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer dereference in nouveau_bo_sync_for_device

2021-06-10 Thread Ondrej Zary
On Thursday 10 June 2021 08:43:06 Christian König wrote:
> 
> Am 09.06.21 um 22:00 schrieb Ondrej Zary:
> > On Wednesday 09 June 2021 11:21:05 Christian König wrote:
> >> Am 09.06.21 um 09:10 schrieb Ondrej Zary:
> >>> On Wednesday 09 June 2021, Christian König wrote:
>  Am 09.06.21 um 08:57 schrieb Ondrej Zary:
> > [SNIP]
> >> Thanks for the heads up. So the problem with my patch is already fixed,
> >> isn't it?
> > The NULL pointer dereference in nouveau_bo_wr16 introduced in
> > 141b15e59175aa174ca1f7596188bd15a7ca17ba was fixed by
> > aea656b0d05ec5b8ed5beb2f94c4dd42ea834e9d.
> >
> > That's the bug I hit when bisecting the original problem:
> > NULL pointer dereference in nouveau_bo_sync_for_device
> > It's caused by:
> > # first bad commit: [e34b8feeaa4b65725b25f49c9b08a0f8707e8e86] drm/ttm: 
> > merge ttm_dma_tt back into ttm_tt
>  Good that I've asked :)
> 
>  Ok that's a bit strange. e34b8feeaa4b65725b25f49c9b08a0f8707e8e86 was
>  created mostly automated.
> 
>  Do you have the original backtrace of that NULL pointer deref once more?
> >>> The original backtrace is here: 
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2021%2F6%2F5%2F350&data=04%7C01%7Cchristian.koenig%40amd.com%7C4309ff021d5e4cbe948b08d92b813106%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637588657045383056%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=t70c9ktzPJzDaEAcO4wpQMv3TUo5b53cUy66AkLeVwE%3D&reserved=0
> >> And the problem is that ttm_dma->dma_address is NULL, right? Mhm, I
> >> don't see how that can happen since nouveau is using ttm_sg_tt_init().
> >>
> >> Apart from that what nouveau does here is rather questionable since you
> >> need a coherent architecture for most things anyway, but that's not what
> >> we are trying to fix here.
> >>
> >> Can you try to narrow down if ttm_sg_tt_init is called before calling
> >> this function for the tt object in question?
> > ttm_sg_tt_init is not called:
> > [   12.150124] nouveau :01:00.0: DRM: VRAM: 31 MiB
> > [   12.150133] nouveau :01:00.0: DRM: GART: 128 MiB
> > [   12.150143] nouveau :01:00.0: DRM: BMP version 5.6
> > [   12.150151] nouveau :01:00.0: DRM: No DCB data found in VBIOS
> > [   12.151362] ttm_tt_init
> > [   12.151370] ttm_tt_init_fields
> > [   12.151374] ttm_tt_alloc_page_directory
> > [   12.151615] BUG: kernel NULL pointer dereference, address: 
> 
> Please add dump_stack(); to ttm_tt_init() and report back with the 
> backtrace.
> 
> I can't see how this is called from the nouveau code, only possibility I 
> see is that it is maybe called through the AGP code somehow.

Yes, you're right:
[   13.192663] Call Trace:
[   13.192678]  dump_stack+0x54/0x68
[   13.192690]  ttm_tt_init+0x11/0x8a [ttm]
[   13.192699]  ttm_agp_tt_create+0x39/0x51 [ttm]
[   13.192840]  nouveau_ttm_tt_create+0x17/0x22 [nouveau]
[   13.192856]  ttm_tt_create+0x78/0x8c [ttm]
[   13.192864]  ttm_bo_handle_move_mem+0x7d/0xca [ttm]
[   13.192873]  ttm_bo_validate+0x92/0xc8 [ttm]
[   13.192883]  ttm_bo_init_reserved+0x216/0x243 [ttm]
[   13.192892]  ttm_bo_init+0x45/0x65 [ttm]
[   13.193018]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
[   13.193150]  nouveau_bo_init+0x8c/0x94 [nouveau]
[   13.193273]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
[   13.193407]  nouveau_bo_new+0x44/0x57 [nouveau]
[   13.193537]  nouveau_channel_prep+0xa3/0x269 [nouveau]
[   13.193665]  nouveau_channel_new+0x3c/0x5f7 [nouveau]
[   13.193679]  ? slab_free_freelist_hook+0x3b/0xa7
[   13.193686]  ? kfree+0x9e/0x11a
[   13.193781]  ? nvif_object_sclass_put+0xd/0x16 [nouveau]
[   13.193908]  nouveau_drm_device_init+0x2e2/0x646 [nouveau]
[   13.193924]  ? pci_enable_device_flags+0x1e/0xac
[   13.194052]  nouveau_drm_probe+0xeb/0x188 [nouveau]
[   13.194182]  ? nouveau_drm_device_init+0x646/0x646 [nouveau]
[   13.194195]  pci_device_probe+0x89/0xe9
[   13.194205]  really_probe+0x127/0x2a7
[   13.194212]  driver_probe_device+0x5b/0x87
[   13.194219]  device_driver_attach+0x2e/0x41
[   13.194226]  __driver_attach+0x7c/0x83
[   13.194232]  bus_for_each_dev+0x4c/0x66
[   13.194238]  driver_attach+0x14/0x16
[   13.194244]  ? device_driver_attach+0x41/0x41
[   13.194251]  bus_add_driver+0xc5/0x16c
[   13.194258]  driver_register+0x87/0xb9
[   13.194265]  __pci_register_driver+0x38/0x3b
[   13.194271]  ? 0xf0c0d000
[   13.194362]  nouveau_drm_init+0x14c/0x1000 [nouveau]

How is ttm_dma_tt->dma_address allocated? I cannot find any assignment
executed (in the working code):

$ git grep dma_address\ = drivers/gpu/
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:   sg->sgl->dma_address = 
addr;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:dma_address = 
&dma->dma_address[offset >> PAGE_SHIFT];
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:dma_address = 
(mm_node->start << PAGE_SHIFT) + offs

Re: [PATCH] drm: Lock pointer access in drm_master_release()

2021-06-10 Thread Emil Velikov
On Thu, 10 Jun 2021 at 11:10, Daniel Vetter  wrote:
>
> On Wed, Jun 09, 2021 at 05:21:19PM +0800, Desmond Cheong Zhi Xi wrote:
> > This patch eliminates the following smatch warning:
> > drivers/gpu/drm/drm_auth.c:320 drm_master_release() warn: unlocked access 
> > 'master' (line 318) expected lock '&dev->master_mutex'
> >
> > The 'file_priv->master' field should be protected by the mutex lock to
> > '&dev->master_mutex'. This is because other processes can concurrently
> > modify this field and free the current 'file_priv->master'
> > pointer. This could result in a use-after-free error when 'master' is
> > dereferenced in subsequent function calls to
> > 'drm_legacy_lock_master_cleanup()' or to 'drm_lease_revoke()'.
> >
> > An example of a scenario that would produce this error can be seen
> > from a similar bug in 'drm_getunique()' that was reported by Syzbot:
> > https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f803
> >
> > In the Syzbot report, another process concurrently acquired the
> > device's master mutex in 'drm_setmaster_ioctl()', then overwrote
> > 'fpriv->master' in 'drm_new_set_master()'. The old value of
> > 'fpriv->master' was subsequently freed before the mutex was unlocked.
> >
> > Reported-by: Dan Carpenter 
> > Signed-off-by: Desmond Cheong Zhi Xi 
>
> Thanks a lot. I've done an audit of this code, and I found another
> potential problem in drm_is_current_master. The callers from drm_auth.c
> hold the dev->master_mutex, but all the external ones dont. I think we
> need to split this into a _locked function for use within drm_auth.c, and
> the exported one needs to grab the dev->master_mutex while it's checking
> master status. Ofc there will still be races, those are ok, but right now
> we run the risk of use-after free problems in drm_lease_owner.
>
Note that some code does acquire the mutex via
drm_master_internal_acquire - so we should be careful.
As mentioned elsewhere - having a _locked version of
drm_is_current_master sounds good.

Might as well throw a lockdep_assert_held_once in there just in case :-P

Happy to help review the follow-up patches.
-Emil


[PATCH v3] Documentation: gpu: Mention the requirements for new properties

2021-06-10 Thread Maxime Ripard
New KMS properties come with a bunch of requirements to avoid each
driver from running their own, inconsistent, set of properties,
eventually leading to issues like property conflicts, inconsistencies
between drivers and semantics, etc.

Let's document what we expect.

Cc: Alexandre Belloni 
Cc: Alexandre Torgue 
Cc: Alex Deucher 
Cc: Alison Wang 
Cc: Alyssa Rosenzweig 
Cc: Andrew Jeffery 
Cc: Andrzej Hajda 
Cc: Anitha Chrisanthus 
Cc: Benjamin Gaignard 
Cc: Ben Skeggs 
Cc: Boris Brezillon 
Cc: Brian Starkey 
Cc: Chen Feng 
Cc: Chen-Yu Tsai 
Cc: Christian Gmeiner 
Cc: "Christian König" 
Cc: Chun-Kuang Hu 
Cc: Edmund Dea 
Cc: Eric Anholt 
Cc: Fabio Estevam 
Cc: Gerd Hoffmann 
Cc: Haneen Mohammed 
Cc: Hans de Goede 
Cc: "Heiko Stübner" 
Cc: Huang Rui 
Cc: Hyun Kwon 
Cc: Inki Dae 
Cc: Jani Nikula 
Cc: Jernej Skrabec 
Cc: Jerome Brunet 
Cc: Joel Stanley 
Cc: John Stultz 
Cc: Jonas Karlman 
Cc: Jonathan Hunter 
Cc: Joonas Lahtinen 
Cc: Joonyoung Shim 
Cc: Jyri Sarha 
Cc: Kevin Hilman 
Cc: Kieran Bingham 
Cc: Krzysztof Kozlowski 
Cc: Kyungmin Park 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Liviu Dudau 
Cc: Lucas Stach 
Cc: Ludovic Desroches 
Cc: Marek Vasut 
Cc: Martin Blumenstingl 
Cc: Matthias Brugger 
Cc: Maxime Coquelin 
Cc: Maxime Ripard 
Cc: Melissa Wen 
Cc: Neil Armstrong 
Cc: Nicolas Ferre 
Cc: "Noralf Trønnes" 
Cc: NXP Linux Team 
Cc: Oleksandr Andrushchenko 
Cc: Patrik Jakobsson 
Cc: Paul Cercueil 
Cc: Pengutronix Kernel Team 
Cc: Philippe Cornu 
Cc: Philipp Zabel 
Cc: Qiang Yu 
Cc: Rob Clark 
Cc: Robert Foss 
Cc: Rob Herring 
Cc: Rodrigo Siqueira 
Cc: Rodrigo Vivi 
Cc: Roland Scheidegger 
Cc: Russell King 
Cc: Sam Ravnborg 
Cc: Sandy Huang 
Cc: Sascha Hauer 
Cc: Sean Paul 
Cc: Seung-Woo Kim 
Cc: Shawn Guo 
Cc: Stefan Agner 
Cc: Steven Price 
Cc: Sumit Semwal 
Cc: Thierry Reding 
Cc: Tian Tao 
Cc: Tomeu Vizoso 
Cc: Tomi Valkeinen 
Cc: VMware Graphics 
Cc: Xinliang Liu 
Cc: Xinwei Kong 
Cc: Yannick Fertre 
Cc: Zack Rusin 
Reviewed-by: Daniel Vetter 
Signed-off-by: Maxime Ripard 

---

Changes from v2:
  - Take into account the feedback from Laurent and Lidiu to no longer
force generic properties, but prefix vendor-specific properties with
the vendor name

Changes from v1:
  - Typos and wording reported by Daniel and Alex
---
 Documentation/gpu/drm-kms.rst | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index 87e5023e3f55..bbe254dca635 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -463,6 +463,33 @@ KMS Properties
 This section of the documentation is primarily aimed at user-space developers.
 For the driver APIs, see the other sections.
 
+Requirements
+
+
+KMS drivers might need to add extra properties to support new features.
+Each new property introduced in a driver need to meet a few
+requirements, in addition to the one mentioned above.:
+
+- Before the introduction of any vendor-specific properties, they must
+  be first checked against the generic ones to avoid any conflict or
+  redundancy.
+
+- Vendor-specific properties must be prefixed by the vendor's name,
+  following the syntax "$vendor:$property".
+
+- Generic properties must be standardized, with some documentation to
+  describe how the property can be used.
+
+- Generic properties must provide a generic helper in the core code to
+  register that property on the object it attaches to.
+
+- Generic properties content must be decoded by the core and provided in
+  the object's associated state structure. That includes anything
+  drivers might want to precompute, like :c:type:`struct drm_clip_rect
+  ` for planes.
+
+- An IGT test should be submitted.
+
 Property Types and Blob Property Support
 
 
-- 
2.31.1



[Bug 213391] AMDGPU retries page fault with some specific processes amdgpu and sometimes followed [gfxhub0] retry page fault until *ERROR* ring gfx timeout, but soft recovered

2021-06-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213391

--- Comment #7 from Nirmoy (nirmoy.ai...@gmail.com) ---
Actually, I am wrong, I checked out v5.12.9-arch1 from Arch and realized the
fix I mentioned before isn't valid.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 3/3] drm/tegra: Use fourcc_mod_is_vendor() helper

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 01:12:36PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Rather than open-coding the vendor extraction operation, use the newly
> introduced helper macro.
> 
> Signed-off-by: Thierry Reding 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/tegra/fb.c| 2 +-
>  drivers/gpu/drm/tegra/plane.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
> index cae8b8cbe9dd..c04dda8353fd 100644
> --- a/drivers/gpu/drm/tegra/fb.c
> +++ b/drivers/gpu/drm/tegra/fb.c
> @@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
>  {
>   uint64_t modifier = framebuffer->modifier;
>  
> - if ((modifier >> 56) == DRM_FORMAT_MOD_VENDOR_NVIDIA) {
> + if (fourcc_mod_is_vendor(modifier, NVIDIA)) {
>   if ((modifier & DRM_FORMAT_MOD_NVIDIA_SECTOR_LAYOUT) == 0)
>   tiling->sector_layout = TEGRA_BO_SECTOR_LAYOUT_TEGRA;
>   else
> diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c
> index 2e65b4075ce6..f7496425fa83 100644
> --- a/drivers/gpu/drm/tegra/plane.c
> +++ b/drivers/gpu/drm/tegra/plane.c
> @@ -109,7 +109,7 @@ static bool tegra_plane_format_mod_supported(struct 
> drm_plane *plane,
>   return true;
>  
>   /* check for the sector layout bit */
> - if ((modifier >> 56) == DRM_FORMAT_MOD_VENDOR_NVIDIA) {
> + if (fourcc_mod_is_vendor(modifier, NVIDIA)) {
>   if (modifier & DRM_FORMAT_MOD_NVIDIA_SECTOR_LAYOUT) {
>   if (!tegra_plane_supports_sector_layout(plane))
>   return false;
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v4 1/2] drm/doc: document how userspace should find out CRTC index

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 11:27:42AM +0300, Pekka Paalanen wrote:
> On Wed,  9 Jun 2021 20:00:38 -0300
> Leandro Ribeiro  wrote:
> 
> > In this patch we add a section to document what userspace should do to
> > find out the CRTC index. This is important as they may be many places in
> > the documentation that need this, so it's better to just point to this
> > section and avoid repetition.
> > 
> > Signed-off-by: Leandro Ribeiro 
> > ---
> >  Documentation/gpu/drm-uapi.rst| 13 +
> >  drivers/gpu/drm/drm_debugfs_crc.c |  8 
> >  include/uapi/drm/drm.h|  4 ++--
> >  3 files changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> > index 04bdc7a91d53..7e51dd40bf6e 100644
> > --- a/Documentation/gpu/drm-uapi.rst
> > +++ b/Documentation/gpu/drm-uapi.rst
> > @@ -457,6 +457,19 @@ Userspace API Structures
> >  .. kernel-doc:: include/uapi/drm/drm_mode.h
> > :doc: overview
> > 
> > +.. _crtc_index:
> > +
> > +CRTC index
> > +--
> > +
> > +CRTC's have both an object ID and an index, and they are not the same 
> > thing.
> > +The index is used in cases where a densely packed identifier for a CRTC is
> > +needed, for instance a bitmask of CRTC's. The member possible_crtcs of 
> > struct
> > +drm_mode_get_plane is an example.
> > +
> > +DRM_IOCTL_MODE_GETRESOURCES populates a structure with an array of CRTC 
> > ID's,
> > +and the CRTC index is its position in this array.
> > +
> >  .. kernel-doc:: include/uapi/drm/drm.h
> > :internal:
> > 
> > diff --git a/drivers/gpu/drm/drm_debugfs_crc.c 
> > b/drivers/gpu/drm/drm_debugfs_crc.c
> > index 3dd70d813f69..bbc3bc4ba844 100644
> > --- a/drivers/gpu/drm/drm_debugfs_crc.c
> > +++ b/drivers/gpu/drm/drm_debugfs_crc.c
> > @@ -46,10 +46,10 @@
> >   * it reached a given hardware component (a CRC sampling "source").
> >   *
> >   * Userspace can control generation of CRCs in a given CRTC by writing to 
> > the
> > - * file dri/0/crtc-N/crc/control in debugfs, with N being the index of the 
> > CRTC.
> > - * Accepted values are source names (which are driver-specific) and the 
> > "auto"
> > - * keyword, which will let the driver select a default source of frame CRCs
> > - * for this CRTC.
> > + * file dri/0/crtc-N/crc/control in debugfs, with N being the :ref:`index 
> > of
> > + * the CRTC`. Accepted values are source names (which are
> > + * driver-specific) and the "auto" keyword, which will let the driver 
> > select a
> > + * default source of frame CRCs for this CRTC.
> >   *
> >   * Once frame CRC generation is enabled, userspace can capture them by 
> > reading
> >   * the dri/0/crtc-N/crc/data file. Each line in that file contains the 
> > frame
> > diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> > index 67b94bc3c885..bbf4e76daa55 100644
> > --- a/include/uapi/drm/drm.h
> > +++ b/include/uapi/drm/drm.h
> > @@ -635,8 +635,8 @@ struct drm_gem_open {
> >  /**
> >   * DRM_CAP_VBLANK_HIGH_CRTC
> >   *
> > - * If set to 1, the kernel supports specifying a CRTC index in the high 
> > bits of
> > - * &drm_wait_vblank_request.type.
> > + * If set to 1, the kernel supports specifying a :ref:`CRTC 
> > index`
> > + * in the high bits of &drm_wait_vblank_request.type.
> >   *
> >   * Starting kernel version 2.6.39, this capability is always set to 1.
> >   */
> > --
> > 2.31.1
> > 
> 
> Hi,
> 
> with the caveat that I didn't actually build the docs and see how they
> look:
> 
> Reviewed-by: Pekka Paalanen 

Pushed to drm-misc-next, thanks for the patch&review.
-Daniel

> 
> 
> Thanks,
> pq



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PULL] drm-misc-fixes

2021-06-10 Thread Maxime Ripard
Hi Dave, Daniel,

Here's this week drm-misc-fixes PR

Thanks!
Maxime

drm-misc-fixes-2021-06-10:
One fix for snu4i that prevents it from probing, two locking fixes for
ttm and drm_auth, one off-by-x1000 fix for mcde and a fix for vc4 to
prevent an out-of-bounds access.
The following changes since commit 0b78f8bcf4951af30b0ae83ea4fad27d641ab617:

  Revert "fb_defio: Remove custom address_space_operations" (2021-06-01 
17:38:40 +0200)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2021-06-10

for you to fetch changes up to c336a5ee984708db4826ef9e47d184e638e29717:

  drm: Lock pointer access in drm_master_release() (2021-06-10 12:22:02 +0200)


One fix for snu4i that prevents it from probing, two locking fixes for
ttm and drm_auth, one off-by-x1000 fix for mcde and a fix for vc4 to
prevent an out-of-bounds access.


Christian König (1):
  drm/ttm: fix deref of bo->ttm without holding the lock v2

Desmond Cheong Zhi Xi (2):
  drm: Fix use-after-free read in drm_getunique()
  drm: Lock pointer access in drm_master_release()

Linus Walleij (1):
  drm/mcde: Fix off by 10^3 in calculation

Mark Rutland (1):
  drm/vc4: fix vc4_atomic_commit_tail() logic

Saravana Kannan (1):
  drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device

 drivers/gpu/drm/drm_auth.c |  3 ++-
 drivers/gpu/drm/drm_ioctl.c|  9 
 drivers/gpu/drm/mcde/mcde_dsi.c|  2 +-
 drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c  | 31 +
 drivers/gpu/drm/sun4i/sun8i_dw_hdmi.h  |  5 +++--
 drivers/gpu/drm/sun4i/sun8i_hdmi_phy.c | 41 +-
 drivers/gpu/drm/ttm/ttm_bo.c   |  5 -
 drivers/gpu/drm/ttm/ttm_device.c   |  8 +--
 drivers/gpu/drm/vc4/vc4_kms.c  |  2 +-
 9 files changed, 80 insertions(+), 26 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH v2 1/7] drm/sysfs: introduce drm_sysfs_connector_hotplug_event

2021-06-10 Thread Maxime Ripard
Hi,

On Wed, Jun 09, 2021 at 09:23:27PM +, Simon Ser wrote:
> This function sends a hotplug uevent with a CONNECTOR property.
> 
> Signed-off-by: Simon Ser 
> ---
>  drivers/gpu/drm/drm_sysfs.c | 25 +
>  include/drm/drm_sysfs.h |  1 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 968a9560b4aa..8423e44c3035 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -343,6 +343,31 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>  
> +/**
> + * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any 
> connector
> + * change
> + * @connector: connector which has changed
> + *
> + * Send a uevent for the DRM connector specified by @connector. This will 
> send
> + * a uevent with the properties HOTPLUG=1 and CONNECTOR.
> + */
> +void drm_sysfs_connector_hotplug_event(struct drm_connector *connector)
> +{
> + struct drm_device *dev = connector->dev;
> + char hotplug_str[] = "HOTPLUG=1", conn_id[21];
> + char *envp[] = { hotplug_str, conn_id, NULL };
> +
> + snprintf(conn_id, sizeof(conn_id),
> +  "CONNECTOR=%u", connector->base.id);
> +
> + drm_dbg_kms(connector->dev,
> + "[CONNECTOR:%d:%s] generating connector hotplug event\n",
> + connector->base.id, connector->name);
> +
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_connector_hotplug_event);

Would it make sense to call sysfs_notify on the status file?

It would allow to call poll() on the status file in sysfs and skipping
udev in simple cases?

Maxime


signature.asc
Description: PGP signature


Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 06:54:13PM +0200, Christian König wrote:
> Am 10.06.21 um 18:37 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:
> > > On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  
> > > wrote:
> > > > On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  
> > > > wrote:
> > > > > On Thu, Jun 10, 2021 at 1:51 AM Christian König
> > > > >  wrote:
> > > > > > Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
> > > > > > > This helper existed to handle the weird corner-cases caused by 
> > > > > > > using
> > > > > > > SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is 
> > > > > > > using
> > > > > > > that anymore (i915 was the only real user), dma_fence_get_rcu is
> > > > > > > sufficient.  The one slightly annoying thing we have to deal with 
> > > > > > > here
> > > > > > > is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
> > > > > > > SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each 
> > > > > > > call site
> > > > > > > ends up being 3 lines instead of 1.
> > > > > > That's an outright NAK.
> > > > > > 
> > > > > > The loop in dma_fence_get_rcu_safe is necessary because the 
> > > > > > underlying
> > > > > > fence object can be replaced while taking the reference.
> > > > > Right.  I had missed a bit of that when I first read through it.  I
> > > > > see the need for the loop now.  But there are some other tricky bits
> > > > > in there besides just the loop.
> > > > I thought that's what the kref_get_unless_zero was for in
> > > > dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
> > > > dma_fence_get_rcu around, since that should either be a kref_get or
> > > > it's just unsafe to call it ...
> > > AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
> > > your fence and it's never recycled.
> > > 
> > > Where the loop comes in is if you have someone come along, under the
> > > RCU write lock or not, and swap out the pointer and unref it while
> > > you're trying to fetch it.  In this case, if you just write the three
> > > lines I duplicated throughout this patch, you'll end up with NULL if
> > > you (partially) lose the race.  The loop exists to ensure that you get
> > > either the old pointer or the new pointer and you only ever get NULL
> > > if somewhere during the mess, the pointer actually gets set to NULL.
> > It's not that easy. At least not for dma_resv.
> > 
> > The thing is, you can't just go in and replace the write fence with
> > something else. There's supposed to be some ordering here (how much we
> > actually still follow that or not is a bit another question, that I'm
> > trying to answer with an audit of lots of drivers), which means if you
> > replace e.g. the exclusive fence, the previous fence will _not_ just
> > get freed. Because the next exclusive fence needs to wait for that to
> > finish first.
> > 
> > Conceptually the refcount will _only_ go to 0 once all later
> > dependencies have seen it get signalled, and once the fence itself has
> > been signalled.
> 
> I think that's the point where it breaks.
> 
> See IIRC radeon for example doesn't keep unsignaled fences around when
> nobody is interested in them. And I think noveau does it that way as well.
> 
> So for example you can have the following
> 1. Submission to 3D ring, this creates fence A.
> 2. Fence A is put as en exclusive fence in a dma_resv object.
> 3. Submission to 3D ring, this creates fence B.
> 4. Fence B is replacing fence A as the exclusive fence in the dma_resv
> object.
> 
> Fence A is replaced and therefore destroyed while it is not even close to be
> signaled. But the replacement is perfectly ok, since fence B is submitted to
> the same ring.
> 
> When somebody would use dma_fence_get_rcu on the exclusive fence and get
> NULL it would fail to wait for the submissions. You don't really need the
> SLAB_TYPESAFE_BY_RCU for this to blow up in your face.

Uh that's wild ...

I thought that's impossible, but in dma_fence_release() we only complain
if there's both waiters and the fence isn't signalled yet. I had no idea.

> We could change that rule of curse, amdgpu for example is always keeping
> fences around until they are signaled. But IIRC that's how it was for radeon
> like forever.

Yeah I think we could, but then we need to do a few things:
- document that defactor only get_rcu_safe is ok to use
- delete get_rcu, it's not really a safe thing to do anywhere

-Daniel

> 
> Regards,
> Christian.
> 
> >   A signalled fence might as well not exist, so if
> > that's what  happened in that tiny window, then yes a legal scenario
> > is the following:
> > 
> > thread A:
> > - rcu_dereference(resv->exclusive_fence);
> > 
> > thread B:
> > - dma_fence signals, retires, drops refcount to 0
> > - sets the exclusive fence to NULL
> > - creates a new dma_fence
> > - sets the exclusive fence to that new fence
> > 
> > thread A:
> > - kref_get_unless_zero fails, we report that the exclusive fence slot is 
>

Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 11:52:23AM -0500, Jason Ekstrand wrote:
> On Thu, Jun 10, 2021 at 11:38 AM Daniel Vetter  wrote:
> >
> > On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:
> > >
> > > On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  
> > > wrote:
> > > >
> > > > On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  
> > > > wrote:
> > > > >
> > > > > On Thu, Jun 10, 2021 at 1:51 AM Christian König
> > > > >  wrote:
> > > > > >
> > > > > > Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
> > > > > > > This helper existed to handle the weird corner-cases caused by 
> > > > > > > using
> > > > > > > SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is 
> > > > > > > using
> > > > > > > that anymore (i915 was the only real user), dma_fence_get_rcu is
> > > > > > > sufficient.  The one slightly annoying thing we have to deal with 
> > > > > > > here
> > > > > > > is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
> > > > > > > SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each 
> > > > > > > call site
> > > > > > > ends up being 3 lines instead of 1.
> > > > > >
> > > > > > That's an outright NAK.
> > > > > >
> > > > > > The loop in dma_fence_get_rcu_safe is necessary because the 
> > > > > > underlying
> > > > > > fence object can be replaced while taking the reference.
> > > > >
> > > > > Right.  I had missed a bit of that when I first read through it.  I
> > > > > see the need for the loop now.  But there are some other tricky bits
> > > > > in there besides just the loop.
> > > >
> > > > I thought that's what the kref_get_unless_zero was for in
> > > > dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
> > > > dma_fence_get_rcu around, since that should either be a kref_get or
> > > > it's just unsafe to call it ...
> > >
> > > AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
> > > your fence and it's never recycled.
> > >
> > > Where the loop comes in is if you have someone come along, under the
> > > RCU write lock or not, and swap out the pointer and unref it while
> > > you're trying to fetch it.  In this case, if you just write the three
> > > lines I duplicated throughout this patch, you'll end up with NULL if
> > > you (partially) lose the race.  The loop exists to ensure that you get
> > > either the old pointer or the new pointer and you only ever get NULL
> > > if somewhere during the mess, the pointer actually gets set to NULL.
> >
> > It's not that easy. At least not for dma_resv.
> >
> > The thing is, you can't just go in and replace the write fence with
> > something else. There's supposed to be some ordering here (how much we
> > actually still follow that or not is a bit another question, that I'm
> > trying to answer with an audit of lots of drivers), which means if you
> > replace e.g. the exclusive fence, the previous fence will _not_ just
> > get freed. Because the next exclusive fence needs to wait for that to
> > finish first.
> >
> > Conceptually the refcount will _only_ go to 0 once all later
> > dependencies have seen it get signalled, and once the fence itself has
> > been signalled. A signalled fence might as well not exist, so if
> > that's what  happened in that tiny window, then yes a legal scenario
> > is the following:
> >
> > thread A:
> > - rcu_dereference(resv->exclusive_fence);
> >
> > thread B:
> > - dma_fence signals, retires, drops refcount to 0
> > - sets the exclusive fence to NULL
> > - creates a new dma_fence
> > - sets the exclusive fence to that new fence
> >
> > thread A:
> > - kref_get_unless_zero fails, we report that the exclusive fence slot is 
> > NULL
> >
> > Ofc normally we're fully pipeline, and we lazily clear slots, so no
> > one ever writes the fence ptr to NULL. But conceptually it's totally
> > fine, and an indistinguishable sequence of events from the point of
> > view of thread A.
> 
> How is reporting that the exclusive fence is NULL ok in that scenario?
>  If someone comes along and calls dma_resv_get_excl_fence(), we want
> them to get either the old fence or the new fence but never NULL.
> NULL would imply that the object is idle which it probably isn't in
> any sort of pipelined world.

The thing is, the kref_get_unless_zero _only_ fails when the object could
have been idle meanwhile and it's exclusive fence slot NULL.

Maybe no one wrote that NULL, but from thread A's pov there's no
difference between those. Therefore returning NULL in that case is totally
fine.

It is _not_ possible for that kref_get_unless_zero to fail, while the
fence isn't signalled yet.

I think we might need to go through this on irc a bit ...
-Daniel

> > Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
> > big time. The only reason you need _unsafe is if you have
> > typesafe_by_rcu, or maybe if you yolo your fence ordering a bit much
> > and break the DAG property in a few cases.
> >
> > > I agree with Christian that that part of dma_fence_get_rcu_safe needs
> > > to st

Re: [PATCH 3/9] drm/vmwgfx: Fix subresource updates with new contexts

2021-06-10 Thread Zack Rusin

On 6/10/21 2:49 AM, Thomas Hellström (Intel) wrote:

Hi,

On 6/9/21 7:23 PM, Zack Rusin wrote:

The has_dx variable was only set during the initialization which
meant that UPDATE_SUBRESOURCE was never used. We were emulating it
with UPDATE_GB_IMAGE but that's always been a stop-gap. Instead
of has_dx which has been deprecated a long time ago we need to check
for whether shader model 4.0 or newer is available to the device.


Stupid question perhaps, but isn't UPDATE_SUBRESOURCE available with 
SVGA_CAP_DX regardless of the SM capabilities of the underlying device?


It is, but the extra functionality it provides is a bit pointless on older 
contexts. In general we're trying to bundle the features in something more 
resembling the windows side, that's not for the purpose of the guest but host 
side or more specifically so that the stack is more coherent and vmwgfx isn't 
doing something uncommon (i.e. using dx10 features with CAP_DX but without 
CAP_DXCONTEXT) where renderers might be asked to do something they've never 
been tested for.

We've overloaded the shader model 4.0 naming though in ways that's not ideal, so 
has_sm4_context really is CAP_DX & CAP_DXCONTEXT, we should've probably went 
with has_d3d10_feature_level, has_d3d11_feature_level, has_gl43_feature_level etc 
instead.

z


Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Christian König

Am 10.06.21 um 18:37 schrieb Daniel Vetter:

On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  wrote:

On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  wrote:

On Thu, Jun 10, 2021 at 1:51 AM Christian König
 wrote:

Am 09.06.21 um 23:29 schrieb Jason Ekstrand:

This helper existed to handle the weird corner-cases caused by using
SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is using
that anymore (i915 was the only real user), dma_fence_get_rcu is
sufficient.  The one slightly annoying thing we have to deal with here
is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call site
ends up being 3 lines instead of 1.

That's an outright NAK.

The loop in dma_fence_get_rcu_safe is necessary because the underlying
fence object can be replaced while taking the reference.

Right.  I had missed a bit of that when I first read through it.  I
see the need for the loop now.  But there are some other tricky bits
in there besides just the loop.

I thought that's what the kref_get_unless_zero was for in
dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
dma_fence_get_rcu around, since that should either be a kref_get or
it's just unsafe to call it ...

AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
your fence and it's never recycled.

Where the loop comes in is if you have someone come along, under the
RCU write lock or not, and swap out the pointer and unref it while
you're trying to fetch it.  In this case, if you just write the three
lines I duplicated throughout this patch, you'll end up with NULL if
you (partially) lose the race.  The loop exists to ensure that you get
either the old pointer or the new pointer and you only ever get NULL
if somewhere during the mess, the pointer actually gets set to NULL.

It's not that easy. At least not for dma_resv.

The thing is, you can't just go in and replace the write fence with
something else. There's supposed to be some ordering here (how much we
actually still follow that or not is a bit another question, that I'm
trying to answer with an audit of lots of drivers), which means if you
replace e.g. the exclusive fence, the previous fence will _not_ just
get freed. Because the next exclusive fence needs to wait for that to
finish first.

Conceptually the refcount will _only_ go to 0 once all later
dependencies have seen it get signalled, and once the fence itself has
been signalled.


I think that's the point where it breaks.

See IIRC radeon for example doesn't keep unsignaled fences around when 
nobody is interested in them. And I think noveau does it that way as well.


So for example you can have the following
1. Submission to 3D ring, this creates fence A.
2. Fence A is put as en exclusive fence in a dma_resv object.
3. Submission to 3D ring, this creates fence B.
4. Fence B is replacing fence A as the exclusive fence in the dma_resv 
object.


Fence A is replaced and therefore destroyed while it is not even close 
to be signaled. But the replacement is perfectly ok, since fence B is 
submitted to the same ring.


When somebody would use dma_fence_get_rcu on the exclusive fence and get 
NULL it would fail to wait for the submissions. You don't really need 
the SLAB_TYPESAFE_BY_RCU for this to blow up in your face.


We could change that rule of curse, amdgpu for example is always keeping 
fences around until they are signaled. But IIRC that's how it was for 
radeon like forever.


Regards,
Christian.


  A signalled fence might as well not exist, so if
that's what  happened in that tiny window, then yes a legal scenario
is the following:

thread A:
- rcu_dereference(resv->exclusive_fence);

thread B:
- dma_fence signals, retires, drops refcount to 0
- sets the exclusive fence to NULL
- creates a new dma_fence
- sets the exclusive fence to that new fence

thread A:
- kref_get_unless_zero fails, we report that the exclusive fence slot is NULL

Ofc normally we're fully pipeline, and we lazily clear slots, so no
one ever writes the fence ptr to NULL. But conceptually it's totally
fine, and an indistinguishable sequence of events from the point of
view of thread A.

Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
big time. The only reason you need _unsafe is if you have
typesafe_by_rcu, or maybe if you yolo your fence ordering a bit much
and break the DAG property in a few cases.


I agree with Christian that that part of dma_fence_get_rcu_safe needs
to stay.  I was missing that until I did my giant "let's walk through
the code" e-mail.

Well if I'm wrong there's a _ton_ of broken code in upstream right
now, even in dma-buf/dma-resv.c. We're using dma_fence_get_rcu a lot.

Also the timing is all backwards: get_rcu_safe was added as a fix for
when i915 made its dma_fence typesafe_by_rcu. We didn't have any need
for this beforehand. So I'm really not quite buying this story

Re: [Intel-gfx] [PATCH 16/31] drm/i915/gem: Add an intermediate proto_context struct (v4)

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 11:44 AM Daniel Vetter  wrote:
>
> On Wed, Jun 09, 2021 at 11:00:26AM -0500, Jason Ekstrand wrote:
> > On Wed, Jun 9, 2021 at 6:28 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Jun 08, 2021 at 11:35:58PM -0500, Jason Ekstrand wrote:
> > > > The current context uAPI allows for two methods of setting context
> > > > parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
> > > > former is allowed to be called at any time while the later happens as
> > > > part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
> > > > settable via the other.  While some params are fairly simple and setting
> > > > them on a live context is harmless such the context priority, others are
> > > > far trickier such as the VM or the set of engines.  In order to swap out
> > > > the VM, for instance, we have to delay until all current in-flight work
> > > > is complete, swap in the new VM, and then continue.  This leads to a
> > > > plethora of potential race conditions we'd really rather avoid.
> > > >
> > > > Unfortunately, both methods of setting the VM and the engine set are in
> > > > active use today so we can't simply disallow setting the VM or engine
> > > > set vial SET_CONTEXT_PARAM.  In order to work around this wart, this
> > > > commit adds a proto-context struct which contains all the context create
> > > > parameters.
> > > >
> > > > v2 (Daniel Vetter):
> > > >  - Better commit message
> > > >  - Use __set/clear_bit instead of set/clear_bit because there's no race
> > > >and we don't need the atomics
> > > >
> > > > v3 (Daniel Vetter):
> > > >  - Use manual bitops and BIT() instead of __set_bit
> > > >
> > > > v4 (Daniel Vetter):
> > > >  - Add a changelog to the commit message
> > > >  - Better hyperlinking in docs
> > > >  - Create the default PPGTT in i915_gem_create_context
> > > >
> > > > Signed-off-by: Jason Ekstrand 
> > > > ---
> > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 124 +++---
> > > >  .../gpu/drm/i915/gem/i915_gem_context_types.h |  22 
> > > >  .../gpu/drm/i915/gem/selftests/mock_context.c |  16 ++-
> > > >  3 files changed, 145 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > index f9a6eac78c0ae..b5d8c1ff5d7b3 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > @@ -191,6 +191,83 @@ static int validate_priority(struct 
> > > > drm_i915_private *i915,
> > > >   return 0;
> > > >  }
> > > >
> > > > +static void proto_context_close(struct i915_gem_proto_context *pc)
> > > > +{
> > > > + if (pc->vm)
> > > > + i915_vm_put(pc->vm);
> > > > + kfree(pc);
> > > > +}
> > > > +
> > > > +static int proto_context_set_persistence(struct drm_i915_private *i915,
> > > > +  struct i915_gem_proto_context 
> > > > *pc,
> > > > +  bool persist)
> > > > +{
> > > > + if (persist) {
> > > > + /*
> > > > +  * Only contexts that are short-lived [that will expire 
> > > > or be
> > > > +  * reset] are allowed to survive past termination. We 
> > > > require
> > > > +  * hangcheck to ensure that the persistent requests are 
> > > > healthy.
> > > > +  */
> > > > + if (!i915->params.enable_hangcheck)
> > > > + return -EINVAL;
> > > > +
> > > > + pc->user_flags |= BIT(UCONTEXT_PERSISTENCE);
> > > > + } else {
> > > > + /* To cancel a context we use "preempt-to-idle" */
> > > > + if (!(i915->caps.scheduler & 
> > > > I915_SCHEDULER_CAP_PREEMPTION))
> > > > + return -ENODEV;
> > > > +
> > > > + /*
> > > > +  * If the cancel fails, we then need to reset, cleanly!
> > > > +  *
> > > > +  * If the per-engine reset fails, all hope is lost! We 
> > > > resort
> > > > +  * to a full GPU reset in that unlikely case, but 
> > > > realistically
> > > > +  * if the engine could not reset, the full reset does not 
> > > > fare
> > > > +  * much better. The damage has been done.
> > > > +  *
> > > > +  * However, if we cannot reset an engine by itself, we 
> > > > cannot
> > > > +  * cleanup a hanging persistent context without causing
> > > > +  * colateral damage, and we should not pretend we can by
> > > > +  * exposing the interface.
> > > > +  */
> > > > + if (!intel_has_reset_engine(&i915->gt))
> > > > + return -ENODEV;
> > > > +
> > > > + pc->user_flags &= ~BIT(UCONTEXT_PERSISTENCE);
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static struct i915_gem_proto_context *
> > > > +proto_con

Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 11:38 AM Daniel Vetter  wrote:
>
> On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:
> >
> > On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  
> > wrote:
> > >
> > > On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  
> > > wrote:
> > > >
> > > > On Thu, Jun 10, 2021 at 1:51 AM Christian König
> > > >  wrote:
> > > > >
> > > > > Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
> > > > > > This helper existed to handle the weird corner-cases caused by using
> > > > > > SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is 
> > > > > > using
> > > > > > that anymore (i915 was the only real user), dma_fence_get_rcu is
> > > > > > sufficient.  The one slightly annoying thing we have to deal with 
> > > > > > here
> > > > > > is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
> > > > > > SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call 
> > > > > > site
> > > > > > ends up being 3 lines instead of 1.
> > > > >
> > > > > That's an outright NAK.
> > > > >
> > > > > The loop in dma_fence_get_rcu_safe is necessary because the underlying
> > > > > fence object can be replaced while taking the reference.
> > > >
> > > > Right.  I had missed a bit of that when I first read through it.  I
> > > > see the need for the loop now.  But there are some other tricky bits
> > > > in there besides just the loop.
> > >
> > > I thought that's what the kref_get_unless_zero was for in
> > > dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
> > > dma_fence_get_rcu around, since that should either be a kref_get or
> > > it's just unsafe to call it ...
> >
> > AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
> > your fence and it's never recycled.
> >
> > Where the loop comes in is if you have someone come along, under the
> > RCU write lock or not, and swap out the pointer and unref it while
> > you're trying to fetch it.  In this case, if you just write the three
> > lines I duplicated throughout this patch, you'll end up with NULL if
> > you (partially) lose the race.  The loop exists to ensure that you get
> > either the old pointer or the new pointer and you only ever get NULL
> > if somewhere during the mess, the pointer actually gets set to NULL.
>
> It's not that easy. At least not for dma_resv.
>
> The thing is, you can't just go in and replace the write fence with
> something else. There's supposed to be some ordering here (how much we
> actually still follow that or not is a bit another question, that I'm
> trying to answer with an audit of lots of drivers), which means if you
> replace e.g. the exclusive fence, the previous fence will _not_ just
> get freed. Because the next exclusive fence needs to wait for that to
> finish first.
>
> Conceptually the refcount will _only_ go to 0 once all later
> dependencies have seen it get signalled, and once the fence itself has
> been signalled. A signalled fence might as well not exist, so if
> that's what  happened in that tiny window, then yes a legal scenario
> is the following:
>
> thread A:
> - rcu_dereference(resv->exclusive_fence);
>
> thread B:
> - dma_fence signals, retires, drops refcount to 0
> - sets the exclusive fence to NULL
> - creates a new dma_fence
> - sets the exclusive fence to that new fence
>
> thread A:
> - kref_get_unless_zero fails, we report that the exclusive fence slot is NULL
>
> Ofc normally we're fully pipeline, and we lazily clear slots, so no
> one ever writes the fence ptr to NULL. But conceptually it's totally
> fine, and an indistinguishable sequence of events from the point of
> view of thread A.

How is reporting that the exclusive fence is NULL ok in that scenario?
 If someone comes along and calls dma_resv_get_excl_fence(), we want
them to get either the old fence or the new fence but never NULL.
NULL would imply that the object is idle which it probably isn't in
any sort of pipelined world.

> Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
> big time. The only reason you need _unsafe is if you have
> typesafe_by_rcu, or maybe if you yolo your fence ordering a bit much
> and break the DAG property in a few cases.
>
> > I agree with Christian that that part of dma_fence_get_rcu_safe needs
> > to stay.  I was missing that until I did my giant "let's walk through
> > the code" e-mail.
>
> Well if I'm wrong there's a _ton_ of broken code in upstream right
> now, even in dma-buf/dma-resv.c. We're using dma_fence_get_rcu a lot.

Yup.  19 times.  What I'm trying to understand is how much of that
code depends on properly catching a pointer-switch race and how much
is ok with a NULL failure mode.  This trybot seems to imply that most
things are ok with the NULL failure mode:

https://patchwork.freedesktop.org/series/91267/

Of course, as we discussed on IRC, I'm not sure how much I trust
proof-by-trybot here. :-)

> Also the timing is all backwards: get_rcu_safe was added as a fix for
> when i915 made its dma_fence typesafe_by

Re: [PATCH] drm: Lock pointer access in drm_master_release()

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 11:21:39PM +0800, Desmond Cheong Zhi Xi wrote:
> On 10/6/21 6:10 pm, Daniel Vetter wrote:
> > On Wed, Jun 09, 2021 at 05:21:19PM +0800, Desmond Cheong Zhi Xi wrote:
> > > This patch eliminates the following smatch warning:
> > > drivers/gpu/drm/drm_auth.c:320 drm_master_release() warn: unlocked access 
> > > 'master' (line 318) expected lock '&dev->master_mutex'
> > > 
> > > The 'file_priv->master' field should be protected by the mutex lock to
> > > '&dev->master_mutex'. This is because other processes can concurrently
> > > modify this field and free the current 'file_priv->master'
> > > pointer. This could result in a use-after-free error when 'master' is
> > > dereferenced in subsequent function calls to
> > > 'drm_legacy_lock_master_cleanup()' or to 'drm_lease_revoke()'.
> > > 
> > > An example of a scenario that would produce this error can be seen
> > > from a similar bug in 'drm_getunique()' that was reported by Syzbot:
> > > https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f803
> > > 
> > > In the Syzbot report, another process concurrently acquired the
> > > device's master mutex in 'drm_setmaster_ioctl()', then overwrote
> > > 'fpriv->master' in 'drm_new_set_master()'. The old value of
> > > 'fpriv->master' was subsequently freed before the mutex was unlocked.
> > > 
> > > Reported-by: Dan Carpenter 
> > > Signed-off-by: Desmond Cheong Zhi Xi 
> > 
> > Thanks a lot. I've done an audit of this code, and I found another
> > potential problem in drm_is_current_master. The callers from drm_auth.c
> > hold the dev->master_mutex, but all the external ones dont. I think we
> > need to split this into a _locked function for use within drm_auth.c, and
> > the exported one needs to grab the dev->master_mutex while it's checking
> > master status. Ofc there will still be races, those are ok, but right now
> > we run the risk of use-after free problems in drm_lease_owner.
> > 
> > Are you up to do that fix too?
> > 
> 
> Hi Daniel,
> 
> Thanks for the pointer, I'm definitely up for it!
> 
> > I think the drm_lease.c code also needs an audit, there we'd need to make
> > sure that we hold hold either the lock or a full master reference to avoid
> > the use-after-free issues here.
> > 
> 
> I'd be happy to look into drm_lease.c as well.
> 
> > Patch merged to drm-misc-fixes with cc: stable.
> > -Daniel
> > 
> > > ---
> > >   drivers/gpu/drm/drm_auth.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> > > index f00e5abdbbf4..b59b26a71ad5 100644
> > > --- a/drivers/gpu/drm/drm_auth.c
> > > +++ b/drivers/gpu/drm/drm_auth.c
> > > @@ -315,9 +315,10 @@ int drm_master_open(struct drm_file *file_priv)
> > >   void drm_master_release(struct drm_file *file_priv)
> > >   {
> > >   struct drm_device *dev = file_priv->minor->dev;
> > > - struct drm_master *master = file_priv->master;
> > > + struct drm_master *master;
> > > 
> > >   mutex_lock(&dev->master_mutex);
> > > + master = file_priv->master;
> > >   if (file_priv->magic)
> > >   idr_remove(&file_priv->master->magic_map, 
> > > file_priv->magic);
> > > -- 
> > > 2.25.1
> > > 
> > 
> 
> From what I can see, there are other places in the kernel that could use the
> _locked version of drm_is_current_master as well, such as drm_mode_getfb in
> drm_framebuffer.c. I'll take a closer look, and if the changes make sense
> I'll prepare a patch series for them.

Oh maybe we have a naming confusion: the _locked is the one where the
caller must grab the lock already, whereas drm_is_current_master would
grab the master_mutex internally to do the check. The one in
drm_framebuffer.c looks like it'd need the internal one since there's no
other need to grab the master_mutex.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: Change how amdgpu stores fences in dma_resv objects

2021-06-10 Thread Christian König

Am 10.06.21 um 18:34 schrieb Michel Dänzer:

On 2021-06-10 11:17 a.m., Christian König wrote:

Since we can't find a consensus on hot to move forward with the dma_resv object 
I concentrated on changing the approach for amdgpu first.

This new approach changes how the driver stores the command submission fence in 
the dma_resv object in DMA-buf exported BOs.

For exported BOs we now store the CS fence in a dma_fence_chain container and 
assign that one to the exclusive fences slot.

During synchronization this dma_fence_chain container is unpacked again and the 
containing fences handled individually.

This has a little bit more overhead than the old approach, but it allows for 
waiting for the exclusive slot for writes again.

Nice!

This seems to work as expected with 
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now 
don't poll readable at first, until the GPU is done processing them.


Well I'm still pretty sure that any polling on the CPU should be 
avoided, but yes it is nice to have that working now in general.



Unfortunately, as expected, without a high priority context for the compositor 
which can preempt client drawing, this isn't enough to prevent slow clients 
from slowing down the compositor as well. But it should already help for 
fullscreen apps where the compositor can directly scan out the client buffers 
at least.


I have seen patches for this flying by internally, but not sure about 
the status.


Christian.


Re: [Intel-gfx] [PATCH 16/31] drm/i915/gem: Add an intermediate proto_context struct (v4)

2021-06-10 Thread Daniel Vetter
On Wed, Jun 09, 2021 at 11:00:26AM -0500, Jason Ekstrand wrote:
> On Wed, Jun 9, 2021 at 6:28 AM Daniel Vetter  wrote:
> >
> > On Tue, Jun 08, 2021 at 11:35:58PM -0500, Jason Ekstrand wrote:
> > > The current context uAPI allows for two methods of setting context
> > > parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
> > > former is allowed to be called at any time while the later happens as
> > > part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
> > > settable via the other.  While some params are fairly simple and setting
> > > them on a live context is harmless such the context priority, others are
> > > far trickier such as the VM or the set of engines.  In order to swap out
> > > the VM, for instance, we have to delay until all current in-flight work
> > > is complete, swap in the new VM, and then continue.  This leads to a
> > > plethora of potential race conditions we'd really rather avoid.
> > >
> > > Unfortunately, both methods of setting the VM and the engine set are in
> > > active use today so we can't simply disallow setting the VM or engine
> > > set vial SET_CONTEXT_PARAM.  In order to work around this wart, this
> > > commit adds a proto-context struct which contains all the context create
> > > parameters.
> > >
> > > v2 (Daniel Vetter):
> > >  - Better commit message
> > >  - Use __set/clear_bit instead of set/clear_bit because there's no race
> > >and we don't need the atomics
> > >
> > > v3 (Daniel Vetter):
> > >  - Use manual bitops and BIT() instead of __set_bit
> > >
> > > v4 (Daniel Vetter):
> > >  - Add a changelog to the commit message
> > >  - Better hyperlinking in docs
> > >  - Create the default PPGTT in i915_gem_create_context
> > >
> > > Signed-off-by: Jason Ekstrand 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 124 +++---
> > >  .../gpu/drm/i915/gem/i915_gem_context_types.h |  22 
> > >  .../gpu/drm/i915/gem/selftests/mock_context.c |  16 ++-
> > >  3 files changed, 145 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > index f9a6eac78c0ae..b5d8c1ff5d7b3 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > @@ -191,6 +191,83 @@ static int validate_priority(struct drm_i915_private 
> > > *i915,
> > >   return 0;
> > >  }
> > >
> > > +static void proto_context_close(struct i915_gem_proto_context *pc)
> > > +{
> > > + if (pc->vm)
> > > + i915_vm_put(pc->vm);
> > > + kfree(pc);
> > > +}
> > > +
> > > +static int proto_context_set_persistence(struct drm_i915_private *i915,
> > > +  struct i915_gem_proto_context *pc,
> > > +  bool persist)
> > > +{
> > > + if (persist) {
> > > + /*
> > > +  * Only contexts that are short-lived [that will expire or 
> > > be
> > > +  * reset] are allowed to survive past termination. We 
> > > require
> > > +  * hangcheck to ensure that the persistent requests are 
> > > healthy.
> > > +  */
> > > + if (!i915->params.enable_hangcheck)
> > > + return -EINVAL;
> > > +
> > > + pc->user_flags |= BIT(UCONTEXT_PERSISTENCE);
> > > + } else {
> > > + /* To cancel a context we use "preempt-to-idle" */
> > > + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
> > > + return -ENODEV;
> > > +
> > > + /*
> > > +  * If the cancel fails, we then need to reset, cleanly!
> > > +  *
> > > +  * If the per-engine reset fails, all hope is lost! We 
> > > resort
> > > +  * to a full GPU reset in that unlikely case, but 
> > > realistically
> > > +  * if the engine could not reset, the full reset does not 
> > > fare
> > > +  * much better. The damage has been done.
> > > +  *
> > > +  * However, if we cannot reset an engine by itself, we 
> > > cannot
> > > +  * cleanup a hanging persistent context without causing
> > > +  * colateral damage, and we should not pretend we can by
> > > +  * exposing the interface.
> > > +  */
> > > + if (!intel_has_reset_engine(&i915->gt))
> > > + return -ENODEV;
> > > +
> > > + pc->user_flags &= ~BIT(UCONTEXT_PERSISTENCE);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static struct i915_gem_proto_context *
> > > +proto_context_create(struct drm_i915_private *i915, unsigned int flags)
> > > +{
> > > + struct i915_gem_proto_context *pc, *err;
> > > +
> > > + pc = kzalloc(sizeof(*pc), GFP_KERNEL);
> > > + if (!pc)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + pc->user_flags = BIT(

Re: handle exclusive fence similar to shared ones

2021-06-10 Thread Daniel Vetter
On Wed, Jun 09, 2021 at 04:07:24PM +0200, Christian König wrote:
> Am 09.06.21 um 15:42 schrieb Daniel Vetter:
> > [SNIP]
> > > That won't work. The problem is that you have only one exclusive slot, but
> > > multiple submissions which execute out of order and compose the buffer
> > > object together.
> > > 
> > > That's why I suggested to use the dma_fence_chain to circumvent this.
> > > 
> > > But if you are ok that amdgpu sets the exclusive fence without changing 
> > > the
> > > shared ones than the solution I've outlined should already work as well.
> > Uh that's indeed nasty. Can you give me the details of the exact use-case
> > so I can read the userspace code and come up with an idea? I was assuming
> > that even with parallel processing there's at least one step at the end
> > that unifies it for the next process.
> 
> Unfortunately not, with Vulkan that is really in the hand of the
> application.

Vulkan explicitly says implicit sync isn't a thing, and you need to
import/export syncobj if you e.g. want to share a buffer with GL.

Ofc because amdgpu always syncs there's a good chance that userspace
running on amdgpu vk doesn't get this right and is breaking the vk spec
here :-/

> But the example we have in the test cases is using 3D+DMA to compose a
> buffer IIRC.

Yeah that's the more interesting one I think. I've heard of some
post-processing steps, but that always needs to wait for 3D to finish. 3D
+ copy engine a separate thing.

> > If we can't detect this somehow then it means we do indeed have to create
> > a fence_chain for the exclusive slot for everything, which would be nasty.
> 
> I've already created a prototype of that and it is not that bad. It does
> have some noticeable overhead, but I think that's ok.

Yup seen that, I'll go and review that tomorrow hopefully. It's not great,
but it's definitely a lot better than the force always sync.

> > Or a large-scale redo across all drivers, which is probaly even more
> > nasty.
> 
> Yeah, that is indeed harder to get right.

Yeah, and there's also a bunch of other confusions in that area.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Daniel Vetter
On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand  wrote:
>
> On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  wrote:
> >
> > On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  wrote:
> > >
> > > On Thu, Jun 10, 2021 at 1:51 AM Christian König
> > >  wrote:
> > > >
> > > > Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
> > > > > This helper existed to handle the weird corner-cases caused by using
> > > > > SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is using
> > > > > that anymore (i915 was the only real user), dma_fence_get_rcu is
> > > > > sufficient.  The one slightly annoying thing we have to deal with here
> > > > > is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
> > > > > SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call 
> > > > > site
> > > > > ends up being 3 lines instead of 1.
> > > >
> > > > That's an outright NAK.
> > > >
> > > > The loop in dma_fence_get_rcu_safe is necessary because the underlying
> > > > fence object can be replaced while taking the reference.
> > >
> > > Right.  I had missed a bit of that when I first read through it.  I
> > > see the need for the loop now.  But there are some other tricky bits
> > > in there besides just the loop.
> >
> > I thought that's what the kref_get_unless_zero was for in
> > dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
> > dma_fence_get_rcu around, since that should either be a kref_get or
> > it's just unsafe to call it ...
>
> AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
> your fence and it's never recycled.
>
> Where the loop comes in is if you have someone come along, under the
> RCU write lock or not, and swap out the pointer and unref it while
> you're trying to fetch it.  In this case, if you just write the three
> lines I duplicated throughout this patch, you'll end up with NULL if
> you (partially) lose the race.  The loop exists to ensure that you get
> either the old pointer or the new pointer and you only ever get NULL
> if somewhere during the mess, the pointer actually gets set to NULL.

It's not that easy. At least not for dma_resv.

The thing is, you can't just go in and replace the write fence with
something else. There's supposed to be some ordering here (how much we
actually still follow that or not is a bit another question, that I'm
trying to answer with an audit of lots of drivers), which means if you
replace e.g. the exclusive fence, the previous fence will _not_ just
get freed. Because the next exclusive fence needs to wait for that to
finish first.

Conceptually the refcount will _only_ go to 0 once all later
dependencies have seen it get signalled, and once the fence itself has
been signalled. A signalled fence might as well not exist, so if
that's what  happened in that tiny window, then yes a legal scenario
is the following:

thread A:
- rcu_dereference(resv->exclusive_fence);

thread B:
- dma_fence signals, retires, drops refcount to 0
- sets the exclusive fence to NULL
- creates a new dma_fence
- sets the exclusive fence to that new fence

thread A:
- kref_get_unless_zero fails, we report that the exclusive fence slot is NULL

Ofc normally we're fully pipeline, and we lazily clear slots, so no
one ever writes the fence ptr to NULL. But conceptually it's totally
fine, and an indistinguishable sequence of events from the point of
view of thread A.

Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
big time. The only reason you need _unsafe is if you have
typesafe_by_rcu, or maybe if you yolo your fence ordering a bit much
and break the DAG property in a few cases.

> I agree with Christian that that part of dma_fence_get_rcu_safe needs
> to stay.  I was missing that until I did my giant "let's walk through
> the code" e-mail.

Well if I'm wrong there's a _ton_ of broken code in upstream right
now, even in dma-buf/dma-resv.c. We're using dma_fence_get_rcu a lot.

Also the timing is all backwards: get_rcu_safe was added as a fix for
when i915 made its dma_fence typesafe_by_rcu. We didn't have any need
for this beforehand. So I'm really not quite buying this story here
yet you're all trying to sell me on.
-Daniel

>
> --Jason
>
> > > > This is completely unrelated to SLAB_TYPESAFE_BY_RCU. See the
> > > > dma_fence_chain usage for reference.
> > > >
> > > > What you can remove is the sequence number handling in dma-buf. That
> > > > should make adding fences quite a bit quicker.
> > >
> > > I'll look at that and try to understand what's going on there.
> >
> > Hm I thought the seqlock was to make sure we have a consistent set of
> > fences across exclusive and all shared slot. Not to protect against
> > the fence disappearing due to typesafe_by_rcu.
> > -Daniel
> >
> > > --Jason
> > >
> > > > Regards,
> > > > Christian.
> > > >
> > > > >
> > > > > Signed-off-by: Jason Ekstrand 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Christian König 
> > > > > Cc: Matthew Auld 
> > > > > Cc: Maarten Lankhorst 
> > > > > ---
> >

Re: [Freedreno] [PATCH 0/8] dsi: rework clock parents and timing handling

2021-06-10 Thread abhinavk

Hi Dmitry

I will take a look at this next week for sure.

Thanks

Abhinav
On 2021-06-10 06:48, Dmitry Baryshkov wrote:

On 15/05/2021 16:12, Dmitry Baryshkov wrote:
This patch series brings back several patches targeting assigning 
dispcc

clock parents, that were removed from the massive dsi rework patchset
earlier.


Gracious ping for this series. I'd ask to skip patch 8 for now (as we
might bring that back for moving PHY to drivers/phy), but patches 1-7
are still valid and pending review/acceptance.



Few notes:
  - assign-clock-parents is a mandatory proprety according to the 
current

dsi.txt description.
  - There is little point in duplicating this functionality with the 
ad-hoc

implementation in the dsi code.

On top of that come few minor cleanups for the DSI PHY drivers.

I'd kindly ask to bring all dts changes also through the drm tree, so
that there won't be any breakage of the functionality.


The following changes since commit 
f2f46b878777e0d3f885c7ddad48f477b4dea247:


   drm/msm/dp: initialize audio_comp when audio starts (2021-05-06 
16:26:57 -0700)


are available in the Git repository at:

   https://git.linaro.org/people/dmitry.baryshkov/kernel.git 
dsi-phy-update


for you to fetch changes up to 
f1fd3b113cbb98febad682fc11ea1c6e717434c2:


   drm/msm/dsi: remove msm_dsi_dphy_timing from msm_dsi_phy 
(2021-05-14 22:55:11 +0300)



Dmitry Baryshkov (8):
   arm64: dts: qcom: sc7180: assign DSI clock source parents
   arm64: dts: qcom: sdm845: assign DSI clock source parents
   arm64: dts: qcom: sdm845-mtp: assign DSI clock source parents
   arm64: dts: qcom: sm8250: assign DSI clock source parents
   drm/msm/dsi: stop setting clock parents manually
   drm/msm/dsi: phy: use of_device_get_match_data
   drm/msm/dsi: drop msm_dsi_phy_get_shared_timings
   drm/msm/dsi: remove msm_dsi_dphy_timing from msm_dsi_phy

  arch/arm64/boot/dts/qcom/sc7180.dtsi|  3 ++
  arch/arm64/boot/dts/qcom/sdm845-mtp.dts |  3 ++
  arch/arm64/boot/dts/qcom/sdm845.dtsi|  6 +++
  arch/arm64/boot/dts/qcom/sm8250.dtsi|  6 +++
  drivers/gpu/drm/msm/dsi/dsi.h   |  7 +---
  drivers/gpu/drm/msm/dsi/dsi_host.c  | 51 
-

  drivers/gpu/drm/msm/dsi/dsi_manager.c   |  8 +---
  drivers/gpu/drm/msm/dsi/phy/dsi_phy.c   | 46 
++

  drivers/gpu/drm/msm/dsi/phy/dsi_phy.h   | 10 -
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_10nm.c  | 11 ++
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c  | 11 ++
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_20nm.c  | 10 +
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_28nm.c  | 12 ++
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_28nm_8960.c | 10 +
  drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c   | 13 ++-
  15 files changed, 67 insertions(+), 140 deletions(-)




Re: Change how amdgpu stores fences in dma_resv objects

2021-06-10 Thread Michel Dänzer
On 2021-06-10 11:17 a.m., Christian König wrote:
> Since we can't find a consensus on hot to move forward with the dma_resv 
> object I concentrated on changing the approach for amdgpu first.
> 
> This new approach changes how the driver stores the command submission fence 
> in the dma_resv object in DMA-buf exported BOs.
> 
> For exported BOs we now store the CS fence in a dma_fence_chain container and 
> assign that one to the exclusive fences slot.
> 
> During synchronization this dma_fence_chain container is unpacked again and 
> the containing fences handled individually.
> 
> This has a little bit more overhead than the old approach, but it allows for 
> waiting for the exclusive slot for writes again.

Nice!

This seems to work as expected with 
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880: Some buffers now 
don't poll readable at first, until the GPU is done processing them.


Unfortunately, as expected, without a high priority context for the compositor 
which can preempt client drawing, this isn't enough to prevent slow clients 
from slowing down the compositor as well. But it should already help for 
fullscreen apps where the compositor can directly scan out the client buffers 
at least.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-10 Thread Christian König

Hi guys,

maybe soften that a bit. Reading from the shared memory of the user 
fence is ok for everybody. What we need to take more care of is the 
writing side.


So my current thinking is that we allow read only access, but writing a 
new sequence value needs to go through the scheduler/kernel.


So when the CPU wants to signal a timeline fence it needs to call an 
IOCTL. When the GPU wants to signal the timeline fence it needs to hand 
that of to the hardware scheduler.


If we lockup the kernel can check with the hardware who did the last 
write and what value was written.


That together with an IOCTL to give out sequence number for implicit 
sync to applications should be sufficient for the kernel to track who is 
responsible if something bad happens.


In other words when the hardware says that the shader wrote stuff like 
0xdeadbeef 0x0 or 0x into memory we kill the process who did that.


If the hardware says that seq - 1 was written fine, but seq is missing 
then the kernel blames whoever was supposed to write seq.


Just pieping the write through a privileged instance should be fine to 
make sure that we don't run into issues.


Christian.

Am 10.06.21 um 17:59 schrieb Marek Olšák:

Hi Daniel,

We just talked about this whole topic internally and we came up to the 
conclusion that the hardware needs to understand sync object handles 
and have high-level wait and signal operations in the command stream. 
Sync objects will be backed by memory, but they won't be readable or 
writable by processes directly. The hardware will log all accesses to 
sync objects and will send the log to the kernel periodically. The 
kernel will identify malicious behavior.


Example of a hardware command stream:
...
ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence 
number is assigned by the kernel

Draw();
ImplicitSyncSignalWhenDone(syncObjHandle);
...

I'm afraid we have no other choice because of the TLB invalidation 
overhead.


Marek


On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter > wrote:


On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:
> Am 09.06.21 um 15:19 schrieb Daniel Vetter:
> > [SNIP]
> > > Yeah, we call this the lightweight and the heavyweight tlb
flush.
> > >
> > > The lighweight can be used when you are sure that you don't
have any of the
> > > PTEs currently in flight in the 3D/DMA engine and you just
need to
> > > invalidate the TLB.
> > >
> > > The heavyweight must be used when you need to invalidate the
TLB *AND* make
> > > sure that no concurrently operation moves new stuff into the
TLB.
> > >
> > > The problem is for this use case we have to use the
heavyweight one.
> > Just for my own curiosity: So the lightweight flush is only
for in-between
> > CS when you know access is idle? Or does that also not work if
userspace
> > has a CS on a dma engine going at the same time because the
tlb aren't
> > isolated enough between engines?
>
> More or less correct, yes.
>
> The problem is a lightweight flush only invalidates the TLB, but
doesn't
> take care of entries which have been handed out to the different
engines.
>
> In other words what can happen is the following:
>
> 1. Shader asks TLB to resolve address X.
> 2. TLB looks into its cache and can't find address X so it asks
the walker
> to resolve.
> 3. Walker comes back with result for address X and TLB puts that
into its
> cache and gives it to Shader.
> 4. Shader starts doing some operation using result for address X.
> 5. You send lightweight TLB invalidate and TLB throws away
cached values for
> address X.
> 6. Shader happily still uses whatever the TLB gave to it in step
3 to
> accesses address X
>
> See it like the shader has their own 1 entry L0 TLB cache which
is not
> affected by the lightweight flush.
>
> The heavyweight flush on the other hand sends out a broadcast
signal to
> everybody and only comes back when we are sure that an address
is not in use
> any more.

Ah makes sense. On intel the shaders only operate in VA,
everything goes
around as explicit async messages to IO blocks. So we don't have
this, the
only difference in tlb flushes is between tlb flush in the IB and
an mmio
one which is independent for anything currently being executed on an
egine.
-Daniel
-- 
Daniel Vetter

Software Engineer, Intel Corporation
http://blog.ffwll.ch 





Re: [Intel-gfx] [PATCH 1/5] drm/i915: Move intel_engine_free_request_pool to i915_request.c

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 10:07 AM Tvrtko Ursulin
 wrote:
>
> On 10/06/2021 14:57, Jason Ekstrand wrote:
> > On Thu, Jun 10, 2021 at 5:04 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> On 09/06/2021 22:29, Jason Ekstrand wrote:
> >>> This appears to break encapsulation by moving an intel_engine_cs
> >>> function to a i915_request file.  However, this function is
> >>> intrinsically tied to the lifetime rules and allocation scheme of
> >>> i915_request and having it in intel_engine_cs.c leaks details of
> >>> i915_request.  We have an abstraction leak either way.  Since
> >>> i915_request's allocation scheme is far more subtle than the simple
> >>> pointer that is intel_engine_cs.request_pool, it's probably better to
> >>> keep i915_request's details to itself.
> >>>
> >>> Signed-off-by: Jason Ekstrand 
> >>> Cc: Jon Bloomfield 
> >>> Cc: Daniel Vetter 
> >>> Cc: Matthew Auld 
> >>> Cc: Maarten Lankhorst 
> >>> ---
> >>>drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 
> >>>drivers/gpu/drm/i915/i915_request.c   | 7 +--
> >>>drivers/gpu/drm/i915/i915_request.h   | 2 --
> >>>3 files changed, 5 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> >>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> >>> index 9ceddfbb1687d..df6b80ec84199 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> >>> @@ -422,14 +422,6 @@ void intel_engines_release(struct intel_gt *gt)
> >>>}
> >>>}
> >>>
> >>> -void intel_engine_free_request_pool(struct intel_engine_cs *engine)
> >>> -{
> >>> - if (!engine->request_pool)
> >>> - return;
> >>> -
> >>> - kmem_cache_free(i915_request_slab_cache(), engine->request_pool);
> >>
> >> Argument that the slab cache shouldn't be exported from i915_request.c
> >> sounds good to me.
> >>
> >> But I think step better than simply reversing the break of encapsulation
> >> (And it's even worse because it leaks much higher level object!) could
> >> be to export a freeing helper from i915_request.c, engine pool would
> >> then use:
> >>
> >> void __i915_request_free(...)
> >> {
> >>  kmem_cache_free(...);
> >> }
> >
> > That was what I did at first.  However, the semantics of how the
> > pointer is touched/modified are really also part of i915_request.  In
> > particular, the use of xchg and cmpxchg.  So I pulled the one other
> > access (besides NULL initializing) into i915_request.c which meant
> > pulling in intel_engine_free_request_pool.
>
> Hmmm in my view the only break of encapsulation at the moment is that
> intel_engine_cs.c knows requests have been allocated from a dedicated slab.
>
> Semantics of how the request pool pointer is managed, so xchg and
> cmpxchg, already are in i915_request.c so I don't exactly follow what is
> the problem with wrapping the knowledge on how requests should be freed
> inside i915_request.c as well?
>
> Unless you view the fact intel_engine_cs contains a pointer to
> i915_request a break as well? But even then... 
>
> > Really, if we wanted proper encapsulation here, we'd have
> >
> > struct i915_request_cache {
> >  struct i915_request *rq;
> > };
> >
> > void i915_request_cache_init(struct i915_request_cache *cache);
> > void i915_request_cache_finish(struct i915_request_cache *cache);
> >
> > all in i915_request.h and have all the gory details inside
> > i915_request.c.  Then all intel_engine_cs knows is that it has a > request 
> > cache.
>
> ... with this scheme you'd have intel_engine_cs contain a pointer to
> i915_request_cache,

No, it would contain an i915_request_cache, not a pointer to one.  It
wouldn't fundamentally change any data structures; just add wrapping.

> which does not seem particularly exciting
> improvement for me since wrapping would be extremely thin with no
> fundamental changes.

Yeah, it's not particularly exciting.

> So for me exporting new __i915_request_free() from i915_request.c makes
> things a bit better and I don't think we need to go further than that.

I'm not sure it's necessary either.  The thing that bothers me is that
we have this pointer that's clearly managed by i915_request.c but is
initialized and finished by intel_context_cs.c.  Certainly adding an
i915_request_free() is better than what we have today.  I'm not sure
it's enough better to really make me happy but, TBH, the whole request
cache thing is a bit of a mess

> I mean there is the issue of i915_request.c knowing about engines having
> request pools, but I am not sure if with i915_request_cache you proposed
> to remove that knowledge and how?

It doesn't, really.  As long as we're stashing a request in the
engine, there's still an encapsulation problem no matter what we do.

>  From the design point of view, given request pool is used only for
> engine pm, clean design could be to manage this from engine pm. Like if
> parking cannot use GFP_KERNEL then check if unparking can and ex

Re: [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

2021-06-10 Thread Jason Ekstrand
On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter  wrote:
>
> On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand  wrote:
> >
> > On Thu, Jun 10, 2021 at 1:51 AM Christian König
> >  wrote:
> > >
> > > Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
> > > > This helper existed to handle the weird corner-cases caused by using
> > > > SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is using
> > > > that anymore (i915 was the only real user), dma_fence_get_rcu is
> > > > sufficient.  The one slightly annoying thing we have to deal with here
> > > > is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
> > > > SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call site
> > > > ends up being 3 lines instead of 1.
> > >
> > > That's an outright NAK.
> > >
> > > The loop in dma_fence_get_rcu_safe is necessary because the underlying
> > > fence object can be replaced while taking the reference.
> >
> > Right.  I had missed a bit of that when I first read through it.  I
> > see the need for the loop now.  But there are some other tricky bits
> > in there besides just the loop.
>
> I thought that's what the kref_get_unless_zero was for in
> dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
> dma_fence_get_rcu around, since that should either be a kref_get or
> it's just unsafe to call it ...

AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
your fence and it's never recycled.

Where the loop comes in is if you have someone come along, under the
RCU write lock or not, and swap out the pointer and unref it while
you're trying to fetch it.  In this case, if you just write the three
lines I duplicated throughout this patch, you'll end up with NULL if
you (partially) lose the race.  The loop exists to ensure that you get
either the old pointer or the new pointer and you only ever get NULL
if somewhere during the mess, the pointer actually gets set to NULL.

I agree with Christian that that part of dma_fence_get_rcu_safe needs
to stay.  I was missing that until I did my giant "let's walk through
the code" e-mail.

--Jason

> > > This is completely unrelated to SLAB_TYPESAFE_BY_RCU. See the
> > > dma_fence_chain usage for reference.
> > >
> > > What you can remove is the sequence number handling in dma-buf. That
> > > should make adding fences quite a bit quicker.
> >
> > I'll look at that and try to understand what's going on there.
>
> Hm I thought the seqlock was to make sure we have a consistent set of
> fences across exclusive and all shared slot. Not to protect against
> the fence disappearing due to typesafe_by_rcu.
> -Daniel
>
> > --Jason
> >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > Signed-off-by: Jason Ekstrand 
> > > > Cc: Daniel Vetter 
> > > > Cc: Christian König 
> > > > Cc: Matthew Auld 
> > > > Cc: Maarten Lankhorst 
> > > > ---
> > > >   drivers/dma-buf/dma-fence-chain.c |  8 ++--
> > > >   drivers/dma-buf/dma-resv.c|  4 +-
> > > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  4 +-
> > > >   drivers/gpu/drm/i915/i915_active.h|  4 +-
> > > >   drivers/gpu/drm/i915/i915_vma.c   |  4 +-
> > > >   include/drm/drm_syncobj.h |  4 +-
> > > >   include/linux/dma-fence.h | 50 ---
> > > >   include/linux/dma-resv.h  |  4 +-
> > > >   8 files changed, 23 insertions(+), 59 deletions(-)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence-chain.c 
> > > > b/drivers/dma-buf/dma-fence-chain.c
> > > > index 7d129e68ac701..46dfc7d94d8ed 100644
> > > > --- a/drivers/dma-buf/dma-fence-chain.c
> > > > +++ b/drivers/dma-buf/dma-fence-chain.c
> > > > @@ -15,15 +15,17 @@ static bool dma_fence_chain_enable_signaling(struct 
> > > > dma_fence *fence);
> > > >* dma_fence_chain_get_prev - use RCU to get a reference to the 
> > > > previous fence
> > > >* @chain: chain node to get the previous node from
> > > >*
> > > > - * Use dma_fence_get_rcu_safe to get a reference to the previous fence 
> > > > of the
> > > > - * chain node.
> > > > + * Use rcu_dereference and dma_fence_get_rcu to get a reference to the
> > > > + * previous fence of the chain node.
> > > >*/
> > > >   static struct dma_fence *dma_fence_chain_get_prev(struct 
> > > > dma_fence_chain *chain)
> > > >   {
> > > >   struct dma_fence *prev;
> > > >
> > > >   rcu_read_lock();
> > > > - prev = dma_fence_get_rcu_safe(&chain->prev);
> > > > + prev = rcu_dereference(chain->prev);
> > > > + if (prev)
> > > > + prev = dma_fence_get_rcu(prev);
> > > >   rcu_read_unlock();
> > > >   return prev;
> > > >   }
> > > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > > > index f26c71747d43a..cfe0db3cca292 100644
> > > > --- a/drivers/dma-buf/dma-resv.c
> > > > +++ b/drivers/dma-buf/dma-resv.c
> > > > @@ -376,7 +376,9 @@ int dma_resv_copy_fences(struct dma_resv *dst, 
> > > > struct dma_resv *src)
> > > >  

  1   2   3   >