date:20210809

Re: [PATCH 14/46] drm/i915: Expose logical engine instance to user

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 06:37:01PM +, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 04:30:06PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:11PM -0700, Matthew Brost wrote:
> > > Expose logical engine instance to user via query engine info IOCTL. This
> > > is required for split-frame workloads as these needs to be placed on
> > > engines in a logically contiguous order. The logical mapping can change
> > > based on fusing. Rather than having user have knowledge of the fusing we
> > > simply just expose the logical mapping with the existing query engine
> > > info IOCTL.
> > > 
> > > Cc: Tvrtko Ursulin 
> > > Signed-off-by: Matthew Brost 
> > 
> > Uapi must have a link to the userspace MR/patch set using this, and to the
> > igt patch set validating it.
> > 
> 
> Have an IGT:
> https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
> 
> Not sure when the media UMD is going to be updated upstream to use this.
> Does that mean I can't merge this until the media UMD is ready? Seems
> like it but isn't that a circular dependency? How can the media team
> develop for a new uAPI that isn't in the kernel yet?

Yes and no. Full explainer here:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

In the drm subsystem this is pretty much the only rule where if you break
it the book will be thrown at you with extreme prejudice.

Also wrt circular: If the umd aren't set up to test their branches against
kernel branches they need to fix their stuff. I know that internally
that's not been done, and its a disaster, but in upstream there's no room
for excuses. Both kernel and userspace needs to be in branches until it's
ready for merging.

> For what it is worth the downstream release is already using this.

Yeah which is another problem, shipping new uapi in downstream before it's
in upstream is decidedly not great.
-Daniel

> 
> Matt
> 
> > Ideally in each patch, since it's way too hard to unfortunately find the
> > cover letter late on.
> > 
> > Jason even went as far as making this a hard requirement because he wasted
> > a bit too much time trying to find the userspace for new uapi:
> > 
> > https://lore.kernel.org/dri-devel/20210804185704.624883-1-ja...@jlekstrand.net/
> > 
> > Cheers, Daniel
> > 
> > >---
> > >  drivers/gpu/drm/i915/i915_query.c | 2 ++
> > >  include/uapi/drm/i915_drm.h   | 8 +++-
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_query.c 
> > > b/drivers/gpu/drm/i915/i915_query.c
> > > index e49da36c62fb..8a72923fbdba 100644
> > > --- a/drivers/gpu/drm/i915/i915_query.c
> > > +++ b/drivers/gpu/drm/i915/i915_query.c
> > > @@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
> > >   for_each_uabi_engine(engine, i915) {
> > >   info.engine.engine_class = engine->uabi_class;
> > >   info.engine.engine_instance = engine->uabi_instance;
> > > + info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
> > >   info.capabilities = engine->uabi_capabilities;
> > > + info.logical_instance = ilog2(engine->logical_mask);
> > >  
> > >   if (copy_to_user(info_ptr, &info, sizeof(info)))
> > >   return -EFAULT;
> > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > > index 7f13d241417f..ef72e07fe08c 100644
> > > --- a/include/uapi/drm/i915_drm.h
> > > +++ b/include/uapi/drm/i915_drm.h
> > > @@ -2706,14 +2706,20 @@ struct drm_i915_engine_info {
> > >  
> > >   /** @flags: Engine flags. */
> > >   __u64 flags;
> > > +#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE(1 << 0)
> > >  
> > >   /** @capabilities: Capabilities of this engine. */
> > >   __u64 capabilities;
> > >  #define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0)
> > >  #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC  (1 << 1)
> > >  
> > > + /** @logical_instance: Logical instance of engine */
> > > + __u16 logical_instance;
> > > +
> > >   /** @rsvd1: Reserved fields. */
> > > - __u64 rsvd1[4];
> > > + __u16 rsvd1[3];
> > > + /** @rsvd2: Reserved fields. */
> > > + __u64 rsvd2[3];
> > >  };
> > >  
> > >  /**
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Letux-kernel] [PATCH 8/8] drm/ingenic: Attach bridge chain to encoders

2021-08-09 Thread Paul Boddie

On Monday, 9 August 2021 18:22:12 CEST Paul Cercueil wrote:
> 
> Le lun., août 9 2021 at 13:14:03 +0200, H. Nikolaus Schaller 
 a écrit :
> >
> > quick feedback: our HDMI on top compiles fine after fixing 2 merge
> > conflicts, but dos not yet work.
> > Will need some spare time with access to the CI20 board to research
> > the issue, i.e. can not give feedback immediately.
> 
> Alright, no problem. I'll be back home in about 2 weeks and then I can
> test on my CI20 as well.

Just for reference, I looked into this initialisation failure. The HDMI 
peripheral driver gets initialised satisfactorily...

dw-hdmi-ingenic 1018.hdmi: Detected HDMI TX controller v1.31a with HDCP 
(DWC HDMI 3D TX PHY)
dw-hdmi-ingenic 1018.hdmi: registered DesignWare HDMI I2C bus driver

But then the reported error occurs in the DRM driver:

ingenic-drm 1305.lcdc0: Unable to init connector
ingenic-drm: probe of 1305.lcdc0 failed with error -22

This originates in a call to drm_bridge_connector_init from ingenic_drm_bind:

connector = drm_bridge_connector_init(drm, encoder);

The invoked function iterates over the registered bridges, one of which seems 
to be the HDMI peripheral (it has bridge operations defined identically to 
those specified in the Synopsys driver), but the type member of the drm_bridge 
structure is set to 0 (DRM_MODE_CONNECTOR_Unknown).

I might expect the bridge to expose a type acquired from its connector, but I 
don't see this propagation occurring in the Synopsys driver: dw_hdmi_probe 
sets the bridge operations and other members of the drm_bridge structure, but 
it doesn't set the type.

Also, it might be possible that dw_hdmi_connector_detect (exposed as the 
detect operation) is not getting called, and this would explain why the 
bridge's connector member does not have the connector_type set, either (since 
it is also set to 0).

Paul

Re: [PATCH 13/46] drm/i915: Add logical engine mapping

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 06:28:58PM +, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 04:28:04PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:10PM -0700, Matthew Brost wrote:
> > > Add logical engine mapping. This is required for split-frame, as
> > > workloads need to be placed on engines in a logically contiguous manner.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 60 ---
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
> > >  .../drm/i915/gt/intel_execlists_submission.c  |  1 +
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  2 +-
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 21 +--
> > >  5 files changed, 56 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> > > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index 0d9105a31d84..4d790f9a65dd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -290,7 +290,8 @@ static void nop_irq_handler(struct intel_engine_cs 
> > > *engine, u16 iir)
> > >   GEM_DEBUG_WARN_ON(iir);
> > >  }
> > >  
> > > -static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id 
> > > id)
> > > +static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id 
> > > id,
> > > +   u8 logical_instance)
> > >  {
> > >   const struct engine_info *info = &intel_engines[id];
> > >   struct drm_i915_private *i915 = gt->i915;
> > > @@ -334,6 +335,7 @@ static int intel_engine_setup(struct intel_gt *gt, 
> > > enum intel_engine_id id)
> > >  
> > >   engine->class = info->class;
> > >   engine->instance = info->instance;
> > > + engine->logical_mask = BIT(logical_instance);
> > >   __sprint_engine_name(engine);
> > >  
> > >   engine->props.heartbeat_interval_ms =
> > > @@ -572,6 +574,37 @@ static intel_engine_mask_t init_engine_mask(struct 
> > > intel_gt *gt)
> > >   return info->engine_mask;
> > >  }
> > >  
> > > +static void populate_logical_ids(struct intel_gt *gt, u8 *logical_ids,
> > > +  u8 class, const u8 *map, u8 num_instances)
> > > +{
> > > + int i, j;
> > > + u8 current_logical_id = 0;
> > > +
> > > + for (j = 0; j < num_instances; ++j) {
> > > + for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> > > + if (!HAS_ENGINE(gt, i) ||
> > > + intel_engines[i].class != class)
> > > + continue;
> > > +
> > > + if (intel_engines[i].instance == map[j]) {
> > > + logical_ids[intel_engines[i].instance] =
> > > + current_logical_id++;
> > > + break;
> > > + }
> > > + }
> > > + }
> > > +}
> > > +
> > > +static void setup_logical_ids(struct intel_gt *gt, u8 *logical_ids, u8 
> > > class)
> > > +{
> > > + int i;
> > > + u8 map[MAX_ENGINE_INSTANCE + 1];
> > > +
> > > + for (i = 0; i < MAX_ENGINE_INSTANCE + 1; ++i)
> > > + map[i] = i;
> > > + populate_logical_ids(gt, logical_ids, class, map, ARRAY_SIZE(map));
> > > +}
> > > +
> > >  /**
> > >   * intel_engines_init_mmio() - allocate and prepare the Engine Command 
> > > Streamers
> > >   * @gt: pointer to struct intel_gt
> > > @@ -583,7 +616,8 @@ int intel_engines_init_mmio(struct intel_gt *gt)
> > >   struct drm_i915_private *i915 = gt->i915;
> > >   const unsigned int engine_mask = init_engine_mask(gt);
> > >   unsigned int mask = 0;
> > > - unsigned int i;
> > > + unsigned int i, class;
> > > + u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
> > >   int err;
> > >  
> > >   drm_WARN_ON(&i915->drm, engine_mask == 0);
> > > @@ -593,15 +627,23 @@ int intel_engines_init_mmio(struct intel_gt *gt)
> > >   if (i915_inject_probe_failure(i915))
> > >   return -ENODEV;
> > >  
> > > - for (i = 0; i < ARRAY_SIZE(intel_engines); i++) {
> > > - if (!HAS_ENGINE(gt, i))
> > > - continue;
> > > + for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
> > > + setup_logical_ids(gt, logical_ids, class);
> > >  
> > > - err = intel_engine_setup(gt, i);
> > > - if (err)
> > > - goto cleanup;
> > > + for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> > > + u8 instance = intel_engines[i].instance;
> > > +
> > > + if (intel_engines[i].class != class ||
> > > + !HAS_ENGINE(gt, i))
> > > + continue;
> > >  
> > > - mask |= BIT(i);
> > > + err = intel_engine_setup(gt, i,
> > > +  logical_ids[instance]);
> > > + if (err)
> > > + goto cleanup;
> > > +
> > > + mask |= BIT(i);
> > > + }
> > >   }
> > >  
> > >   /*
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> > > b/drivers/

Re: [Intel-gfx] [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 06:20:51PM +, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 04:27:01PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> > > Calling switch_to_kernel_context isn't needed if the engine PM reference
> > > is taken while all contexts are pinned. By not calling
> > > switch_to_kernel_context we save on issuing a request to the engine.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
> > > b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > index 1f07ac4e0672..58099de6bf07 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > > @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct 
> > > intel_engine_cs *engine)
> > >   unsigned long flags;
> > >   bool result = true;
> > >  
> > > + /* No need to switch_to_kernel_context if GuC submission */
> > 
> > Maybe whack a big FIXME on here that we should unravel this properly.
> 
> Sure, can add a FIXME here.
> 
> > Currently the execlist backend assumptions are leaked all over the place,
> > leading to stuff like this. Which means extremely fragile code.
> >
> 
> Yes, this something required for execlists implemented in what should be
> generic code. 
> 
> > I currently don't have a great idea on how exactly we should do that, but
> > oh well.
> 
> Me either, it will be a process.
> 
> > 
> > btw just in case we ever want to make guc lrc properly evictable (which as
> > the og use-case for this function, way, way back), would we need to fully
> 
> Can you explain what you mean by fully evictable? Not getting what you
> mean in this context.
> 
> > unregister them from guc? At least I'm assuming there's no other trick
> 
> If scheduling is disabled on the context (currently done on unpin) you are
> free move anything around as the GuC is guaranteed not to touch the
> context state. If on re-pin something has moved (e.g. the LRC vaddr is
> different), you need to unregister and re-register the context with the
> GuC.

So at that point GuC also guarantees that it's not left in the hw engine?
Execlist has this barrier request to fully unload the ctx from the hw, and
that's also why I cam on the topic of OA.

> > like the below one.
> > 
> > Another aside: How does the perf/OA patching work on GuC?
> >
> 
> Not my area of expertise but perf somewhat a WIP. The plan is for the
> GuC to write out some stats to HWSP I think? John Harrison is working to
> get this fully implemented.
> 
> OA is working afaik, with Umesh Nerlige Ramappa being the expert here.

I think it's OA that I'm thinking of here: We have code in i915_perf.c to
patch all the ctx currently in the system, so that they have a consistent
OA config. That's also relying on this barrier stuff, and I was wondering
how that will work with GuC.
-Daniel

> 
> Matt
> 
> > Anyway, patch looks legit:
> > 
> > Reviewed-by: Daniel Vetter 
> > 
> > 
> > > + if (intel_engine_uses_guc(engine))
> > > + return true;
> > > +
> > >   /* GPU is pointing to the void, as good as in the kernel context. */
> > >   if (intel_gt_is_wedged(engine->gt))
> > >   return true;
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 06:11:37PM +, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 04:23:42PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:07PM -0700, Matthew Brost wrote:
> > > Taking a PM reference to prevent intel_gt_wait_for_idle from short
> > > circuiting while a scheduling of user context could be enabled.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/i915/Makefile |  1 +
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
> > >  2 files changed, 34 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > > index 903de270f2db..5e3a1e2095b0 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -103,6 +103,7 @@ gt-y += \
> > >   gt/intel_gt_clock_utils.o \
> > >   gt/intel_gt_irq.o \
> > >   gt/intel_gt_pm.o \
> > > + gt/intel_gt_pm_unpark_work.o \
> > 
> > This file isn't here?
> > 
> 
> Yep, included this in the wrong patch. Should be in:
> https://patchwork.freedesktop.org/patch/448462/?series=92789&rev=2
> 
> > Also pm stuff tends to have very nasty locking requirements, doing special
> > stuff like this in the backend tends to lead to really big surprises. I
> > think two options to make sure our locking design stays consistent:
> > - Lift this to generic code.
> 
> Not sure I'm following this, intel_engine_pm_get/put are generic calls.
> Those calls should have all the correct annoations. If they don't we can
> add them.

But you only call them in the GuC backend, not in all of them. Which is an
inconsistency in locking, and unfortunately runtime pm is extremely nasty,
so having potentially very divergent locking behind the same interface in
the same driver is a recipe for an unmaintainable mess.

Iow, if the high-level code runs on execlist or the ringbuffer backend we
still need to go through at least the lockdep motions of what you're
adding here.

This is similar in spirit to all the might_sleep/might_lock calls we have
all over the kernel where in many cases something doesn't happen, but we
need to make sure it's allowed to have a consistent design.

So essentially in the intel_context_pin and all these functions put a
intel_engine_pm_might_get (which compiles out without debugging enabled),
unconditionally, across all platforms and sched backends.

In general I think backend specific locking (irrespective of what kind of
backend or interface you implement) is a pretty bad idea in the kernel,
and needs to be avoided if at all possible. Avoid here means "pull the
might_lock/might_sleep/might_whatever checks into generic code".
-Daniel

> Matt
> 
> > - expose some engine_pm_migt_get/put() calls which do have the right set
> >   of might_lock annoations, and call those in the generic code.
> > 
> > Imo the worst kernel abstractions are those where all implementations
> > look&act the same, except for locking. Unfortunately i915-gem code is full
> > of this stuff, and we need to stop this by enlisting lockdep to check the
> > contracts for us.
> > -Daniel
> > 
> > >   gt/intel_gt_pm_irq.o \
> > >   gt/intel_gt_requests.o \
> > >   gt/intel_gtt.o \
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 7fe4d1559a81..c5d9548bfd00 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -2056,7 +2056,12 @@ static int guc_context_pre_pin(struct 
> > > intel_context *ce,
> > >  
> > >  static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > >  {
> > > - return __guc_context_pin(ce, ce->engine, vaddr);
> > > + int ret = __guc_context_pin(ce, ce->engine, vaddr);
> > > +
> > > + if (likely(!ret && !intel_context_is_barrier(ce)))
> > > + intel_engine_pm_get(ce->engine);
> > > +
> > > + return ret;
> > >  }
> > >  
> > >  static void guc_context_unpin(struct intel_context *ce)
> > > @@ -2067,6 +2072,9 @@ static void guc_context_unpin(struct intel_context 
> > > *ce)
> > >  
> > >   unpin_guc_id(guc, ce, true);
> > >   lrc_unpin(ce);
> > > +
> > > + if (likely(!intel_context_is_barrier(ce)))
> > > + intel_engine_pm_put(ce->engine);
> > >  }
> > >  
> > >  static void guc_context_post_unpin(struct intel_context *ce)
> > > @@ -3002,8 +3010,30 @@ static int guc_virtual_context_pre_pin(struct 
> > > intel_context *ce,
> > >  static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> > >  {
> > >   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > + int ret = __guc_context_pin(ce, engine, vaddr);
> > > + intel_engine_mask_t tmp, mask = ce->engine->mask;
> > > +
> > > + if (likely(!ret))
> > > + for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > > + intel_engine_pm_get(engine);
> > >  
> > > - return __guc_context_pin(ce, engine, vaddr

Re: [PATCH 0/1] Fix gem_ctx_persistence failures with GuC submission

2021-08-09 Thread Daniel Vetter

On Wed, Jul 28, 2021 at 05:33:59PM -0700, Matthew Brost wrote:
> Should fix below failures with GuC submission for the following tests:
> gem_exec_balancer --r noheartbeat
> gem_ctx_persistence --r heartbeat-close
> 
> Not going to fix:
> gem_ctx_persistence --r heartbeat-many
> gem_ctx_persistence --r heartbeat-stop

After looking at that big thread and being very confused: Are we fixing an
actual use-case here, or is this another case of blindly following igts
tests just because they exist?

I'm leaning towards that we should stall on this, and first document what
exactly is the actual intention behind all this, and then fix up the tests
to match (if needed). And only then fix up GuC to match whatever we
actually want to do.
-Daniel

> 
> As the above tests change the heartbeat value to 0 (off) after the
> context is closed and we have no way to detect that with GuC submission
> unless we keep a list of closed but running contexts which seems like
> overkill for a non-real world use case. We likely should just skip these
> tests with GuC submission.
> 
> Signed-off-by: Matthew Brost 
> 
> Matthew Brost (1):
>   drm/i915: Check if engine has heartbeat when closing a context
> 
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
>  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
>  .../drm/i915/gt/intel_execlists_submission.c  | 14 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
>  6 files changed, 26 insertions(+), 24 deletions(-)
> 
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote:
> On 8/6/2021 12:46, Daniel Vetter wrote:
> > Seen this fly by and figured I dropped a few thoughts in here. At the
> > likely cost of looking a bit out of whack :-)
> > 
> > On Fri, Aug 6, 2021 at 8:01 PM John Harrison  
> > wrote:
> > > On 8/2/2021 02:40, Tvrtko Ursulin wrote:
> > > > On 30/07/2021 19:13, John Harrison wrote:
> > > > > On 7/30/2021 02:49, Tvrtko Ursulin wrote:
> > > > > > On 30/07/2021 01:13, John Harrison wrote:
> > > > > > > On 7/28/2021 17:34, Matthew Brost wrote:
> > > > > > > > If an engine associated with a context does not have a 
> > > > > > > > heartbeat,
> > > > > > > > ban it
> > > > > > > > immediately. This is needed for GuC submission as a idle pulse
> > > > > > > > doesn't
> > > > > > > > kick the context off the hardware where it then can check for a
> > > > > > > > heartbeat and ban the context.
> > > > > > Pulse, that is a request with I915_PRIORITY_BARRIER, does not
> > > > > > preempt a running normal priority context?
> > > > > > 
> > > > > > Why does it matter then whether or not heartbeats are enabled - when
> > > > > > heartbeat just ends up sending the same engine pulse (eventually,
> > > > > > with raising priority)?
> > > > > The point is that the pulse is pointless. See the rest of my comments
> > > > > below, specifically "the context will get resubmitted to the hardware
> > > > > after the pulse completes". To re-iterate...
> > > > > 
> > > > > Yes, it preempts the context. Yes, it does so whether heartbeats are
> > > > > enabled or not. But so what? Who cares? You have preempted a context.
> > > > > It is no longer running on the hardware. BUT IT IS STILL A VALID
> > > > > CONTEXT.
> > > > It is valid yes, and it even may be the current ABI so another
> > > > question is whether it is okay to change that.
> > > > 
> > > > > The backend scheduler will just resubmit it to the hardware as soon
> > > > > as the pulse completes. The only reason this works at all is because
> > > > > of the horrid hack in the execlist scheduler's back end
> > > > > implementation (in __execlists_schedule_in):
> > > > >   if (unlikely(intel_context_is_closed(ce) &&
> > > > >!intel_engine_has_heartbeat(engine)))
> > > > >   intel_context_set_banned(ce);
> > > > Right, is the above code then needed with this patch - when ban is
> > > > immediately applied on the higher level?
> > > > 
> > > > > The actual back end scheduler is saying "Is this a zombie context? Is
> > > > > the heartbeat disabled? Then ban it". No other scheduler backend is
> > > > > going to have knowledge of zombie context status or of the heartbeat
> > > > > status. Nor are they going to call back into the higher levels of the
> > > > > i915 driver to trigger a ban operation. Certainly a hardware
> > > > > implemented scheduler is not going to be looking at private i915
> > > > > driver information to decide whether to submit a context or whether
> > > > > to tell the OS to kill it off instead.
> > > > > 
> > > > > For persistence to work with a hardware scheduler (or a non-Intel
> > > > > specific scheduler such as the DRM one), the handling of zombie
> > > > > contexts, banning, etc. *must* be done entirely in the front end. It
> > > > > cannot rely on any backend hacks. That means you can't rely on any
> > > > > fancy behaviour of pulses.
> > > > > 
> > > > > If you want to ban a context then you must explicitly ban that
> > > > > context. If you want to ban it at some later point then you need to
> > > > > track it at the top level as a zombie and then explicitly ban that
> > > > > zombie at whatever later point.
> > > > I am still trying to understand it all. If I go by the commit message:
> > > > 
> > > > """
> > > > This is needed for GuC submission as a idle pulse doesn't
> > > > kick the context off the hardware where it then can check for a
> > > > heartbeat and ban the context.
> > > > """
> > > > 
> > > > That did not explain things for me. Sentence does not appear to make
> > > > sense. Now, it seems "kick off the hardware" is meant as revoke and
> > > > not just preempt. Which is fine, perhaps just needs to be written more
> > > > explicitly. But the part of checking for heartbeat after idle pulse
> > > > does not compute for me. It is the heartbeat which emits idle pulses,
> > > > not idle pulse emitting heartbeats.
> > > I am in agreement that the commit message is confusing and does not
> > > explain either the problem or the solution.
> > > 
> > > 
> > > > 
> > > > But anyway, I can buy the handling at the front end story completely.
> > > > It makes sense. We just need to agree that a) it is okay to change the
> > > > ABI and b) remove the backend check from execlists if it is not needed
> > > > any longer.
> > > > 
> > > > And if ABI change is okay then commit message needs to talk about it
> > > > loudly and clearly.
> > > I don't think we have a choice. The current ABI is not and can

Re: [Intel-gfx] linux-next: Signed-off-by missing for commit in the drm-intel tree

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 09:19:39AM -0700, Matt Roper wrote:
> On Mon, Aug 09, 2021 at 04:05:59PM +0200, Daniel Vetter wrote:
> > On Fri, Aug 06, 2021 at 09:36:56AM +0300, Joonas Lahtinen wrote:
> > > Hi Matt,
> > > 
> > > Always use the dim tooling when applying patches, it will do the right
> > > thing with regards to adding the S-o-b.
> > 
> > fd.o server rejects any pushes that haven't been done by dim, so how did
> > this get through?
> 
> I definitely used dim for all of these patches, but I'm not sure how I
> lost my s-o-b on this one.  Maybe when I edited the commit message after
> 'dim extract-tags' I accidentally deleted an extra line when I removed
> the extract-tags marker?  It's the only patch where the line is missing,
> so it's almost certainly human error on my part rather than something
> dim did wrong.

Yeah that's an expected failure model, and dim is supposed to catch that
by rechecking for sobs when you push. See dim_push_branch ->
checkpatch_commit_push_range in dim. So you can hand-edit stuff however
you want, dim /should/ catch it when pushing. That it didn't is kinda
confusing and I'd like to know why that slipped through.

> > Matt, can you pls figure out and type up the patch to
> > plug that hole?
> 
> Are you referring to a patch for dim here?  The i915 patch has already
> landed, so we can't change its commit message now.

Yeah dim, not drm-intel, that can't be fixed anymore because it's all
baked in.
-Daniel

> 
> 
> Matt
> 
> > 
> > Thanks, Daniel
> > 
> > > 
> > > Regards, Joonas
> > > 
> > > Quoting Stephen Rothwell (2021-07-15 07:18:54)
> > > > Hi all,
> > > > 
> > > > Commit
> > > > 
> > > >   db47fe727e1f ("drm/i915/step: 
> > > > s/_revid_tbl/_revids")
> > > > 
> > > > is missing a Signed-off-by from its committer.
> > > > 
> > > > -- 
> > > > Cheers,
> > > > Stephen Rothwell
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> VTT-OSGC Platform Enablement
> Intel Corporation
> (916) 356-2795

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

[PATCH v2] drm/mediatek: Add component_del in OVL and COLOR remove function

2021-08-09 Thread jason-jh . lin

Add component_del in OVL and COLOR remove function.

Fixes: ff1395609e20 ("drm/mediatek: Move mtk_ddp_comp_init() from sub driver to 
DRM driver")
Signed-off-by: jason-jh.lin 
---
Change in v2:
- add component_del function in COLOR remove function
---
 drivers/gpu/drm/mediatek/mtk_disp_color.c | 2 ++
 drivers/gpu/drm/mediatek/mtk_disp_ovl.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/mediatek/mtk_disp_color.c 
b/drivers/gpu/drm/mediatek/mtk_disp_color.c
index 6f4c80bbc0eb..473f5bb5cbad 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_color.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_color.c
@@ -133,6 +133,8 @@ static int mtk_disp_color_probe(struct platform_device 
*pdev)
 
 static int mtk_disp_color_remove(struct platform_device *pdev)
 {
+   component_del(&pdev->dev, &mtk_disp_color_component_ops);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
index fa9d79963cd3..5326989d5206 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
@@ -423,6 +423,8 @@ static int mtk_disp_ovl_probe(struct platform_device *pdev)
 
 static int mtk_disp_ovl_remove(struct platform_device *pdev)
 {
+   component_del(&pdev->dev, &mtk_disp_ovl_component_ops);
+
return 0;
 }
 
-- 
2.18.0

[PATCH v2] drm/mediatek: add AAL output size configuration

2021-08-09 Thread jason-jh . lin

To avoid the output width and height is incorrect,
AAL_OUTPUT_SIZE configuration should be set.

Fixes: 0664d1392c26 ("drm/mediatek: Add AAL engine basic function")
Signed-off-by: jason-jh.lin 
---
Change in v2:
- fix to one line
---
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c 
b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
index 75bc00e17fc4..50d20562e612 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
@@ -34,6 +34,7 @@
 
 #define DISP_AAL_EN0x
 #define DISP_AAL_SIZE  0x0030
+#define DISP_AAL_OUTPUT_SIZE   0x04d8
 
 #define DISP_DITHER_EN 0x
 #define DITHER_EN  BIT(0)
@@ -197,6 +198,7 @@ static void mtk_aal_config(struct device *dev, unsigned int 
w,
struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev);
 
mtk_ddp_write(cmdq_pkt, w << 16 | h, &priv->cmdq_reg, priv->regs, 
DISP_AAL_SIZE);
+   mtk_ddp_write(cmdq_pkt, w << 16 | h, &priv->cmdq_reg, priv->regs, 
DISP_AAL_OUTPUT_SIZE);
 }
 
 static void mtk_aal_gamma_set(struct device *dev, struct drm_crtc_state *state)
-- 
2.18.0

[PATCH] drm/arm/malidp: fix mode_valid couldn't cull invalid modes issue

2021-08-09 Thread Sandor . yu

From: Sandor Yu 

In function malidp_crtc_mode_valid, mode->crtc_mode = 0 when run
in drm_helper_probe_single_connector_modes.
Invalid video modes are not culled
and all modes move to the connector's modes list.
It is not expected by mode_valid.

Replace mode->crtc_clock with mode->clock to fix the issue.

Signed-off-by: Sandor Yu 
---
 drivers/gpu/drm/arm/malidp_crtc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/arm/malidp_crtc.c 
b/drivers/gpu/drm/arm/malidp_crtc.c
index 494075ddbef6..55890334385d 100644
--- a/drivers/gpu/drm/arm/malidp_crtc.c
+++ b/drivers/gpu/drm/arm/malidp_crtc.c
@@ -31,7 +31,7 @@ static enum drm_mode_status malidp_crtc_mode_valid(struct 
drm_crtc *crtc,
 * check that the hardware can drive the required clock rate,
 * but skip the check if the clock is meant to be disabled (req_rate = 
0)
 */
-   long rate, req_rate = mode->crtc_clock * 1000;
+   long rate, req_rate = mode->clock * 1000;
 
if (req_rate) {
rate = clk_round_rate(hwdev->pxlclk, req_rate);
-- 
2.17.1

[PATCH] dma_heap: enable map_attrs when (un)map_attachment

2021-08-09 Thread guangming.cao

From: Guangming Cao 

For dma-heap users, they can't bypass cache sync when (un)map
iova with dma heap. But they can do it by adding
DMA_ATTR_SKIP_CPU_SYNC into dma_alloc_attrs.

To Keep alignment, add map_attrs support for dma_heap when
(un)map_attachment.

Signed-off-by: Guangming Cao 
---
 drivers/dma-buf/heaps/cma_heap.c| 6 --
 drivers/dma-buf/heaps/system_heap.c | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 0c05b79870f9..2c9feb3bfc3e 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -99,9 +99,10 @@ static struct sg_table *cma_heap_map_dma_buf(struct 
dma_buf_attachment *attachme
 {
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = &a->table;
+   int attrs = attachment->dma_map_attrs;
int ret;
 
-   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   ret = dma_map_sgtable(attachment->dev, table, direction, attrs);
if (ret)
return ERR_PTR(-ENOMEM);
a->mapped = true;
@@ -113,9 +114,10 @@ static void cma_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
   enum dma_data_direction direction)
 {
struct dma_heap_attachment *a = attachment->priv;
+   int attrs = attachment->dma_map_attrs;
 
a->mapped = false;
-   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+   dma_unmap_sgtable(attachment->dev, table, direction, attrs);
 }
 
 static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 23a7e74ef966..fc7b1e02988e 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -130,9 +130,10 @@ static struct sg_table *system_heap_map_dma_buf(struct 
dma_buf_attachment *attac
 {
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = a->table;
+   int attrs = attachment->dma_map_attrs;
int ret;
 
-   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   ret = dma_map_sgtable(attachment->dev, table, direction, attrs);
if (ret)
return ERR_PTR(ret);
 
@@ -145,9 +146,10 @@ static void system_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heap_attachment *a = attachment->priv;
+   int attrs = attachment->dma_map_attrs;
 
a->mapped = false;
-   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+   dma_unmap_sgtable(attachment->dev, table, direction, attrs);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
-- 
2.17.1

[PATCH] drm/amd/display: remove variable backlight

2021-08-09 Thread zhaoxiao

The variable backlight is being initialized with a value that
is never read, it is being re-assigned immediately afterwards.
Clean up the code by removing the need for variable backlight.

Signed-off-by: zhaoxiao 
---
 drivers/gpu/drm/amd/display/dc/dce/dce_abm.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_abm.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_abm.c
index 874b132fe1d7..0808433185f8 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_abm.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_abm.c
@@ -177,23 +177,21 @@ static void dce_abm_init(struct abm *abm, uint32_t 
backlight)
 static unsigned int dce_abm_get_current_backlight(struct abm *abm)
 {
struct dce_abm *abm_dce = TO_DCE_ABM(abm);
-   unsigned int backlight = REG_READ(BL1_PWM_CURRENT_ABM_LEVEL);
 
/* return backlight in hardware format which is unsigned 17 bits, with
 * 1 bit integer and 16 bit fractional
 */
-   return backlight;
+   return REG_READ(BL1_PWM_CURRENT_ABM_LEVEL);
 }
 
 static unsigned int dce_abm_get_target_backlight(struct abm *abm)
 {
struct dce_abm *abm_dce = TO_DCE_ABM(abm);
-   unsigned int backlight = REG_READ(BL1_PWM_TARGET_ABM_LEVEL);
 
/* return backlight in hardware format which is unsigned 17 bits, with
 * 1 bit integer and 16 bit fractional
 */
-   return backlight;
+   return REG_READ(BL1_PWM_TARGET_ABM_LEVEL);
 }
 
 static bool dce_abm_set_level(struct abm *abm, uint32_t level)
-- 
2.20.1

Re: [PATCH] drm/msm/dp: add drm debug logs to dp_pm_resume/suspend

2021-08-09 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-08-09 14:58:57)
> Add drm debug logs to dp_pm_resume and dp_pm_suspend to help
> debug suspend/resume issues.
>
> Fixes: 355ab7428f09 ("drm/msm/dp: add debug logs to dp_pm_resume/suspend")

BTW, I have no idea what commit this is. Best to probably just drop it?

Re: [PATCH] drm/msm/dp: add drm debug logs to dp_pm_resume/suspend

2021-08-09 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-08-09 14:58:57)
> Add drm debug logs to dp_pm_resume and dp_pm_suspend to help
> debug suspend/resume issues.
>
> Fixes: 355ab7428f09 ("drm/msm/dp: add debug logs to dp_pm_resume/suspend")
> Signed-off-by: Kuogee Hsieh 
> ---

Reviewed-by: Stephen Boyd

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-09 Thread John Harrison


On 8/6/2021 12:46, Daniel Vetter wrote:

Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat,
ban it
immediately. This is needed for GuC submission as a idle pulse
doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.

Pulse, that is a request with I915_PRIORITY_BARRIER, does not
preempt a running normal priority context?

Why does it matter then whether or not heartbeats are enabled - when
heartbeat just ends up sending the same engine pulse (eventually,
with raising priority)?

The point is that the pulse is pointless. See the rest of my comments
below, specifically "the context will get resubmitted to the hardware
after the pulse completes". To re-iterate...

Yes, it preempts the context. Yes, it does so whether heartbeats are
enabled or not. But so what? Who cares? You have preempted a context.
It is no longer running on the hardware. BUT IT IS STILL A VALID
CONTEXT.

It is valid yes, and it even may be the current ABI so another
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon
as the pulse completes. The only reason this works at all is because
of the horrid hack in the execlist scheduler's back end
implementation (in __execlists_schedule_in):
  if (unlikely(intel_context_is_closed(ce) &&
   !intel_engine_has_heartbeat(engine)))
  intel_context_set_banned(ce);

Right, is the above code then needed with this patch - when ban is
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is
the heartbeat disabled? Then ban it". No other scheduler backend is
going to have knowledge of zombie context status or of the heartbeat
status. Nor are they going to call back into the higher levels of the
i915 driver to trigger a ban operation. Certainly a hardware
implemented scheduler is not going to be looking at private i915
driver information to decide whether to submit a context or whether
to tell the OS to kill it off instead.

For persistence to work with a hardware scheduler (or a non-Intel
specific scheduler such as the DRM one), the handling of zombie
contexts, banning, etc. *must* be done entirely in the front end. It
cannot rely on any backend hacks. That means you can't rely on any
fancy behaviour of pulses.

If you want to ban a context then you must explicitly ban that
context. If you want to ban it at some later point then you need to
track it at the top level as a zombie and then explicitly ban that
zombie at whatever later point.

I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make
sense. Now, it seems "kick off the hardware" is meant as revoke and
not just preempt. Which is fine, perhaps just needs to be written more
explicitly. But the part of checking for heartbeat after idle pulse
does not compute for me. It is the heartbeat which emits idle pulses,
not idle pulse emitting heartbeats.

I am in agreement that the commit message is confusing and does not
explain either the problem or the solution.




But anyway, I can buy the handling at the front end story completely.
It makes sense. We just need to agree that a) it is okay to change the
ABI and b) remove the backend check from execlists if it is not needed
any longer.

And if ABI change is okay then commit message needs to talk about it
loudly and clearly.

I don't think we have a choice. The current ABI is not and cannot ever
be compatible with any scheduler external to i915. It cannot be
implemented with a hardware scheduler such as the GuC and it cannot be
implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

Helper libraries won't work with a hardware scheduler.




My view is that any implementation involving knowledge of the heartbeat
is fundamentally broken.

According to Daniel Vetter, the DRM ABI on this subject is that an
actively executing context should persist until the DRM file handle is
closed. That seems like a much more plausible and simple ABI th

Re: [PATCH v2 1/6] dt-bindings: drm/panel-simple: Introduce generic eDP panels

2021-08-09 Thread Rob Herring

On Mon, Aug 9, 2021 at 4:20 PM Doug Anderson  wrote:
>
> Hi,
>
> On Mon, Aug 2, 2021 at 6:39 AM Rob Herring  wrote:
> >
> > On Fri, 30 Jul 2021 14:26:20 -0700, Douglas Anderson wrote:
> > > eDP panels generally contain almost everything needed to control them
> > > in their EDID. This comes from their DP heritage were a computer needs
> > > to be able to properly control pretty much any DP display that's
> > > plugged into it.
> > >
> > > The one big issue with eDP panels and the reason that we need a panel
> > > driver for them is that the power sequencing can be different per
> > > panel.
> > >
> > > While it is true that eDP panel sequencing can be arbitrarily complex,
> > > in practice it turns out that many eDP panels are compatible with just
> > > some slightly different delays. See the contents of the bindings file
> > > introduced in this patch for some details.
> > >
> > > The fact that eDP panels are 99% probable and that the power
> > > sequencing (especially power up) can be compatible between many panels
> > > means that there's a constant desire to plug multiple different panels
> > > into the same board. This could be for second sourcing purposes or to
> > > support multiple SKUs (maybe a 11" and a 13", for instance).
> > >
> > > As discussed [1], it should be OK to support this by adding two
> > > properties to the device tree to specify the delays needed for
> > > powering up the panel the first time. We'll create a new "edp-panel"
> > > bindings file and define the two delays that might need to be
> > > specified. NOTE: in the vast majority of the cases (HPD is hooked up
> > > and isn't glitchy or is debounced) even these delays aren't needed.
> > >
> > > [1] 
> > > https://lore.kernel.org/r/CAD=FV=vzyompwqzzwdhjgh5cjjww_ecm-wqveivz-bdgxjp...@mail.gmail.com
> > >
> > > Signed-off-by: Douglas Anderson 
> > > ---
> > >
> > > Changes in v2:
> > > - No longer allow fallback to panel-simple.
> > > - Add "-ms" suffix to delays.
> > >
> > >  .../bindings/display/panel/panel-edp.yaml | 188 ++
> > >  1 file changed, 188 insertions(+)
> > >  create mode 100644 
> > > Documentation/devicetree/bindings/display/panel/panel-edp.yaml
> > >
> >
> > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> > on your patch (DT_CHECKER_FLAGS is new in v5.13):
> >
> > yamllint warnings/errors:
> >
> > dtschema/dtc warnings/errors:
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/panel/panel-edp.example.dt.yaml:
> >  bridge@2d: 'aux-bus' does not match any of the regexes: 'pinctrl-[0-9]+'
> > From schema: 
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml
> > \ndoc reference errors (make refcheckdocs):
> >
> > See https://patchwork.ozlabs.org/patch/1511822
> >
> > This check can fail if there are any dependencies. The base for a patch
> > series is generally the most recent rc1.
>
> I think it's a dependency problem. No hits here:
>
> git grep aux-bus v5.14-rc5 -- Documentation/devicetree/bindings/
>
> ...but I get hits against "linuxnext".

Am I supposed to figure them out? A simple "'aux-bus' warning is fixed
by commit XYZ in foo tree' in the patch would help. Then I won't send
the failure email (I do review them, so it's not your free testing
service :) ). If you list the dependency then I'm not going to spam
folks with failures. If you don't then I will so no one applies things
without dependencies (as often they are not queued).

> Rob: I'm hoping that this can
> still be in your queue for review even with the bot warning.

Sometimes, but you don't have to guess. You can look at patchwork.
Though there is a latency between sending failure emails and my
changing PW state.

In any case, it looks good to me.

Reviewed-by: Rob Herring 

Rob

[PATCH] drm/amdgpu: Removed unnecessary if statement

2021-08-09 Thread Sergio Miguéns Iglesias

There was an "if" statement that did nothing so it was removed.

Signed-off-by: Sergio Miguéns Iglesias 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 09b048647523..5eb3869d029e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -273,9 +273,6 @@ static int amdgpufb_create(struct drm_fb_helper *helper,
return 0;
 
 out:
-   if (abo) {
-
-   }
if (fb && ret) {
drm_gem_object_put(gobj);
drm_framebuffer_unregister_private(fb);
-- 
2.32.0

Re: [PATCH v2 1/6] dt-bindings: drm/panel-simple: Introduce generic eDP panels

2021-08-09 Thread Doug Anderson

Hi,

On Mon, Aug 2, 2021 at 6:39 AM Rob Herring  wrote:
>
> On Fri, 30 Jul 2021 14:26:20 -0700, Douglas Anderson wrote:
> > eDP panels generally contain almost everything needed to control them
> > in their EDID. This comes from their DP heritage were a computer needs
> > to be able to properly control pretty much any DP display that's
> > plugged into it.
> >
> > The one big issue with eDP panels and the reason that we need a panel
> > driver for them is that the power sequencing can be different per
> > panel.
> >
> > While it is true that eDP panel sequencing can be arbitrarily complex,
> > in practice it turns out that many eDP panels are compatible with just
> > some slightly different delays. See the contents of the bindings file
> > introduced in this patch for some details.
> >
> > The fact that eDP panels are 99% probable and that the power
> > sequencing (especially power up) can be compatible between many panels
> > means that there's a constant desire to plug multiple different panels
> > into the same board. This could be for second sourcing purposes or to
> > support multiple SKUs (maybe a 11" and a 13", for instance).
> >
> > As discussed [1], it should be OK to support this by adding two
> > properties to the device tree to specify the delays needed for
> > powering up the panel the first time. We'll create a new "edp-panel"
> > bindings file and define the two delays that might need to be
> > specified. NOTE: in the vast majority of the cases (HPD is hooked up
> > and isn't glitchy or is debounced) even these delays aren't needed.
> >
> > [1] 
> > https://lore.kernel.org/r/CAD=FV=vzyompwqzzwdhjgh5cjjww_ecm-wqveivz-bdgxjp...@mail.gmail.com
> >
> > Signed-off-by: Douglas Anderson 
> > ---
> >
> > Changes in v2:
> > - No longer allow fallback to panel-simple.
> > - Add "-ms" suffix to delays.
> >
> >  .../bindings/display/panel/panel-edp.yaml | 188 ++
> >  1 file changed, 188 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/panel/panel-edp.yaml
> >
>
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
>
> yamllint warnings/errors:
>
> dtschema/dtc warnings/errors:
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/panel/panel-edp.example.dt.yaml:
>  bridge@2d: 'aux-bus' does not match any of the regexes: 'pinctrl-[0-9]+'
> From schema: 
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml
> \ndoc reference errors (make refcheckdocs):
>
> See https://patchwork.ozlabs.org/patch/1511822
>
> This check can fail if there are any dependencies. The base for a patch
> series is generally the most recent rc1.

I think it's a dependency problem. No hits here:

git grep aux-bus v5.14-rc5 -- Documentation/devicetree/bindings/

...but I get hits against "linuxnext". Rob: I'm hoping that this can
still be in your queue for review even with the bot warning. Thanks!
:-)

-Doug

Re: [PATCH v2 0/6] eDP: Support probing eDP panels dynamically instead of hardcoding

2021-08-09 Thread Doug Anderson

Hi,

On Tue, Aug 3, 2021 at 1:41 PM Sam Ravnborg  wrote:
>
> Hi Douglas,
>
> On Fri, Jul 30, 2021 at 02:26:19PM -0700, Douglas Anderson wrote:
> > The goal of this patch series is to move away from hardcoding exact
> > eDP panels in device tree files. As discussed in the various patches
> > in this series (I'm not repeating everything here), most eDP panels
> > are 99% probable and we can get that last 1% by allowing two "power
> > up" delays to be specified in the device tree file and then using the
> > panel ID (found in the EDID) to look up additional power sequencing
> > delays for the panel.
>
> Have you considered a new driver for edp panels?
> panel-edp.c?
>
> There will be some duplicate code from pnale-simple - but the same can
> be said by the other panel drivers too.
> In the end I think it is better to separate them so we end up with two
> less complex panel drivers rather than one do-it-all panel driver.
>
> I have not looked in detail how this would look like, but my first
> impression is that we should split it out.

I certainly could, but my argument against it is that really it's the
exact same set of eDP panels that would be supported by both drivers.
By definition the "simple" eDP panels are the ones that just have a
regulator/enable GPIO and some timings to turn them off. Those are the
exact same set of panels that can be probed if we just provide the
panel ID that's hardcoded in the EDID. As you can see from the
implementation patch I'm actually sharing the private data structures
(the ones containing the timing) for panels that are supported both as
"probable" and as hardcoded. If we split into two drivers we'd either
need to duplicate the timings for all panels supported by both drivers
or we'd have to somehow export them (maybe hard if things are in
modules). Also, since it's the same set of eDP panels we'd need to
exactly duplicate all the code handling delays / HPD. It just doesn't
feel right to me.

-Doug

Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-08-09 Thread Tom Lendacky

On 8/8/21 8:41 PM, Kuppuswamy, Sathyanarayanan wrote:
> Hi Tom,
> 
> On 7/27/21 3:26 PM, Tom Lendacky wrote:
>> This patch series provides a generic helper function, prot_guest_has(),
>> to replace the sme_active(), sev_active(), sev_es_active() and
>> mem_encrypt_active() functions.
>>
>> It is expected that as new protected virtualization technologies are
>> added to the kernel, they can all be covered by a single function call
>> instead of a collection of specific function calls all called from the
>> same locations.
>>
>> The powerpc and s390 patches have been compile tested only. Can the
>> folks copied on this series verify that nothing breaks for them.
> 
> With this patch set, select ARCH_HAS_PROTECTED_GUEST and set
> CONFIG_AMD_MEM_ENCRYPT=n, creates following error.
> 
> ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data':
> arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted'
> 
> It looks like early_memremap_is_setup_data() is not protected with
> appropriate config.

Ok, thanks for finding that. I'll fix that.

Thanks,
Tom

> 
> 
>>
>> Cc: Andi Kleen 
>> Cc: Andy Lutomirski 
>> Cc: Ard Biesheuvel 
>> Cc: Baoquan He 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Borislav Petkov 
>> Cc: Christian Borntraeger 
>> Cc: Daniel Vetter 
>> Cc: Dave Hansen 
>> Cc: Dave Young 
>> Cc: David Airlie 
>> Cc: Heiko Carstens 
>> Cc: Ingo Molnar 
>> Cc: Joerg Roedel 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: Peter Zijlstra 
>> Cc: Thomas Gleixner 
>> Cc: Thomas Zimmermann 
>> Cc: Vasily Gorbik 
>> Cc: VMware Graphics 
>> Cc: Will Deacon 
>>
>> ---
>>
>> Patches based on:
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.git&data=04%7C01%7Cthomas.lendacky%40amd.com%7C563b5e30a3254f6739aa08d95ad6e242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637640701228434514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vx9v4EmYqVTsJ7KSr97gQaBWJ%2Fq%2BE9NOzXMhe3Fp7T8%3D&reserved=0
>> master
>>    commit 79e920060fa7 ("Merge branch 'WIP/fixes'")
>>
>> Tom Lendacky (11):
>>    mm: Introduce a function to check for virtualization protection
>>  features
>>    x86/sev: Add an x86 version of prot_guest_has()
>>    powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
>>    x86/sme: Replace occurrences of sme_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
>>    treewide: Replace the use of mem_encrypt_active() with
>>  prot_guest_has()
>>    mm: Remove the now unused mem_encrypt_active() function
>>    x86/sev: Remove the now unused mem_encrypt_active() function
>>    powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
>>  function
>>    s390/mm: Remove the now unused mem_encrypt_active() function
>>
>>   arch/Kconfig   |  3 ++
>>   arch/powerpc/include/asm/mem_encrypt.h |  5 --
>>   arch/powerpc/include/asm/protected_guest.h | 30 +++
>>   arch/powerpc/platforms/pseries/Kconfig |  1 +
>>   arch/s390/include/asm/mem_encrypt.h    |  2 -
>>   arch/x86/Kconfig   |  1 +
>>   arch/x86/include/asm/kexec.h   |  2 +-
>>   arch/x86/include/asm/mem_encrypt.h | 13 +
>>   arch/x86/include/asm/protected_guest.h | 27 ++
>>   arch/x86/kernel/crash_dump_64.c    |  4 +-
>>   arch/x86/kernel/head64.c   |  4 +-
>>   arch/x86/kernel/kvm.c  |  3 +-
>>   arch/x86/kernel/kvmclock.c |  4 +-
>>   arch/x86/kernel/machine_kexec_64.c | 19 +++
>>   arch/x86/kernel/pci-swiotlb.c  |  9 ++--
>>   arch/x86/kernel/relocate_kernel_64.S   |  2 +-
>>   arch/x86/kernel/sev.c  |  6 +--
>>   arch/x86/kvm/svm/svm.c |  3 +-
>>   arch/x86/mm/ioremap.c  | 16 +++---
>>   arch/x86/mm/mem_encrypt.c  | 60 +++---
>>   arch/x86/mm/mem_encrypt_identity.c |  3 +-
>>   arch/x86/mm/pat/set_memory.c   |  3 +-
>>   arch/x86/platform/efi/efi_64.c |  9 ++--
>>   arch/x86/realmode/init.c   |  8 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +-
>>   drivers/gpu/drm/drm_cache.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c    |  6 +--
>>   drivers/iommu/amd/init.c   |  7 +--
>>   drivers/iommu/amd/iommu.c  |  3 +-
>>   drivers/iommu/amd/iommu_v2.c   |  3 +-
>>   drivers/iommu/iommu.c  |  3 +-
>>   fs/proc/vmcore.c   |  6 +--
>>   include/linux/mem_encrypt.h    |  4 --
>>   incl

Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Kuppuswamy, Sathyanarayanan





On 8/9/21 2:59 PM, Tom Lendacky wrote:

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.


We don't plan to set PROT_STATE. So it does not affect TDX.
For SMP, we use MADT ACPI table for AP booting.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky

On 8/2/21 7:42 AM, Christophe Leroy wrote:
> 
> 
> Le 28/07/2021 à 00:26, Tom Lendacky a écrit :
>> Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
>> with the PATTR_MEM_ENCRYPT attribute.
> 
> 
> What about
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Flinuxppc-dev%2Fpatch%2F20210730114231.23445-1-will%40kernel.org%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7C1198d62463e04a27be5908d955b30433%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635049667233612%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Erpu4Du05sVYkYuAfTkXdLvq48%2FlfLS2q%2FZW8DG3tFw%3D&reserved=0>
>  ?

Ah, looks like that just went into the PPC tree and isn't part of the tip
tree. I'll have to look into how to handle that one.

Thanks,
Tom

> 
> Christophe
> 
> 
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Dave Hansen 
>> Cc: Andy Lutomirski 
>> Cc: Peter Zijlstra 
>> Cc: David Airlie 
>> Cc: Daniel Vetter 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Thomas Zimmermann 
>> Cc: VMware Graphics 
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Dave Young 
>> Cc: Baoquan He 
>> Signed-off-by: Tom Lendacky 
>> ---
>>   arch/x86/kernel/head64.c    | 4 ++--
>>   arch/x86/mm/ioremap.c   | 4 ++--
>>   arch/x86/mm/mem_encrypt.c   | 5 ++---
>>   arch/x86/mm/pat/set_memory.c    | 3 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
>>   drivers/gpu/drm/drm_cache.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
>>   drivers/iommu/amd/iommu.c   | 3 ++-
>>   drivers/iommu/amd/iommu_v2.c    | 3 ++-
>>   drivers/iommu/iommu.c   | 3 ++-
>>   fs/proc/vmcore.c    | 6 +++---
>>   kernel/dma/swiotlb.c    | 4 ++--
>>   13 files changed, 29 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index de01903c3735..cafed6456d45 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -19,7 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include 
>> +#include 
>>   #include 
>>     #include 
>> @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
>> physaddr,
>>    * there is no need to zero it after changing the memory encryption
>>    * attribute.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
>>   vaddr = (unsigned long)__start_bss_decrypted;
>>   vaddr_end = (unsigned long)__end_bss_decrypted;
>>   for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 0f2d5ace5986..5e1c1f5cbbe8 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -693,7 +693,7 @@ static bool __init
>> early_memremap_is_setup_data(resource_size_t phys_addr,
>>   bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned
>> long size,
>>    unsigned long flags)
>>   {
>> -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return true;
>>     if (flags & MEMREMAP_ENC)
>> @@ -723,7 +723,7 @@ pgprot_t __init
>> early_memremap_pgprot_adjust(resource_size_t phys_addr,
>>   {
>>   bool encrypted_prot;
>>   -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return prot;
>>     encrypted_prot = true;
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 451de8e84fce..0f1533dbe81c 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long
>> vaddr, unsigned long size)
>>   /*
>>    * SME and SEV are very similar but they are not the same, so there are
>>    * times that the kernel will need to distinguish between SME and SEV.
>> The
>> - * sme_active() and sev_active() functions are used for this.  When a
>> - * distinction isn't needed, the mem_encrypt_active() function can be
>> used.
>> + * sme_active() and sev_active() functions are used for this.
>>    *
>>    * The trampoline code is a good example for this requirement.  Before
>>    * paging is activated, SME will access all memory as decrypted, but SEV
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>    * The unused memory range was mapped decrypted, change the
>> encryption
>>    * attribute from decrypted to encrypted before freeing it.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (sme_me_mask) {
>>   r = set_memory_encrypted(vaddr, npages);
>>   if (r) {
>>   pr_warn("failed to free unused decrypted pages\n");
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c

[PATCH] drm/msm/dp: add drm debug logs to dp_pm_resume/suspend

2021-08-09 Thread Kuogee Hsieh

Add drm debug logs to dp_pm_resume and dp_pm_suspend to help
debug suspend/resume issues.

Fixes: 355ab7428f09 ("drm/msm/dp: add debug logs to dp_pm_resume/suspend")
Signed-off-by: Kuogee Hsieh 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 8a85613..870b926 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1284,6 +1284,9 @@ static int dp_pm_resume(struct device *dev)
 
mutex_lock(&dp->event_mutex);
 
+   DRM_DEBUG_DP("Before, core_inited=%d power_on=%d\n",
+   dp->core_initialized, dp_display->power_on);
+
/* start from disconnected state */
dp->hpd_state = ST_DISCONNECTED;
 
@@ -1315,6 +1318,10 @@ static int dp_pm_resume(struct device *dev)
else
dp->dp_display.is_connected = false;
 
+   DRM_DEBUG_DP("After, sink_count=%d is_connected=%d core_inited=%d 
power_on=%d\n",
+   dp->link->sink_count, dp->dp_display.is_connected,
+   dp->core_initialized, dp_display->power_on);
+
mutex_unlock(&dp->event_mutex);
 
return 0;
@@ -1330,6 +1337,9 @@ static int dp_pm_suspend(struct device *dev)
 
mutex_lock(&dp->event_mutex);
 
+   DRM_DEBUG_DP("Before, core_inited=%d power_on=%d\n",
+   dp->core_initialized, dp_display->power_on);
+
if (dp->core_initialized == true) {
/* mainlink enabled */
if (dp_power_clk_status(dp->power, DP_CTRL_PM))
@@ -1343,6 +1353,9 @@ static int dp_pm_suspend(struct device *dev)
/* host_init will be called at pm_resume */
dp->core_initialized = false;
 
+   DRM_DEBUG_DP("After, core_inited=%d power_on=%d\n",
+   dp->core_initialized, dp_display->power_on);
+
mutex_unlock(&dp->event_mutex);
 
return 0;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky

On 8/2/21 5:45 AM, Joerg Roedel wrote:
> On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote:
>> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct 
>> trampoline_header *th)
>>  if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>>  th->flags |= TH_FLAGS_SME_ACTIVE;
>>  
>> -if (sev_es_active()) {
>> +if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
>>  /*
>>   * Skip the call to verify_cpu() in secondary_startup_64 as it
>>   * will cause #VC exceptions when the AP can't handle them yet.
> 
> Not sure how TDX will handle AP booting, are you sure it needs this
> special setup as well? Otherwise a check for SEV-ES would be better
> instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.

Thanks,
Tom

> 
> Regards,
> 
> Joerg
>

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky

On 7/30/21 5:34 PM, Sean Christopherson wrote:
> On Tue, Jul 27, 2021, Tom Lendacky wrote:
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>   * The unused memory range was mapped decrypted, change the encryption
>>   * attribute from decrypted to encrypted before freeing it.
>>   */
>> -if (mem_encrypt_active()) {
>> +if (sme_me_mask) {
> 
> Any reason this uses sme_me_mask?  The helper it calls, 
> __set_memory_enc_dec(),
> uses prot_guest_has(PATTR_MEM_ENCRYPT) so I assume it's available?

Probably just a slip on my part. I was debating at one point calling the
helper vs. referencing the variables/functions directly in the
mem_encrypt.c file.

Thanks,
Tom

> 
>>  r = set_memory_encrypted(vaddr, npages);
>>  if (r) {
>>  pr_warn("failed to free unused decrypted pages\n");
>

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 1:35 PM Caleb Connolly  wrote:
>
>
>
> On 09/08/2021 18:58, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:28 AM Akhil P Oommen  
> > wrote:
> >>
> >> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
> >>>
> >>>
> >>> On 09/08/2021 17:12, Rob Clark wrote:
>  On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen 
>  wrote:
> >
> > On 8/8/2021 10:22 PM, Rob Clark wrote:
> >> On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly
> >>  wrote:
> >>>
> >>>
> >>>
> >>> On 07/08/2021 21:04, Rob Clark wrote:
>  On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
>   wrote:
> >
> > Hi Rob, Akhil,
> >
> > On 29/07/2021 21:53, Rob Clark wrote:
> >> On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> >>  wrote:
> >>>
> >>>
> >>>
> >>> On 29/07/2021 21:24, Rob Clark wrote:
>  On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
>   wrote:
> >
> > Hi Rob,
> >
> > I've done some more testing! It looks like before that patch
> > ("drm/msm: Devfreq tuning") the GPU would never get above
> > the second frequency in the OPP table (342MHz) (at least, not
> > in glxgears). With the patch applied it would more
> > aggressively jump up to the max frequency which seems to be
> > unstable at the default regulator voltages.
> 
>  *ohh*, yeah, ok, that would explain it
> 
> > Hacking the pm8005 s1 regulator (which provides VDD_GFX) up
> > to 0.988v (instead of the stock 0.516v) makes the GPU stable
> > at the higher frequencies.
> >
> > Applying this patch reverts the behaviour, and the GPU never
> > goes above 342MHz in glxgears, losing ~30% performance in
> > glxgear.
> >
> > I think (?) that enabling CPR support would be the proper
> > solution to this - that would ensure that the regulators run
> > at the voltage the hardware needs to be stable.
> >
> > Is hacking the voltage higher (although ideally not quite
> > that high) an acceptable short term solution until we have
> > CPR? Or would it be safer to just not make use of the higher
> > frequencies on a630 for now?
> >
> 
>  tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is
>  already
>  on CC and I added sboyd, maybe one of them knows better.
> 
>  In the short term, removing the higher problematic OPPs from
>  dts might
>  be a better option than this patch (which I'm dropping), since
>  there
>  is nothing stopping other workloads from hitting higher OPPs.
> >>> Oh yeah that sounds like a more sensible workaround than mine .
> 
>  I'm slightly curious why I didn't have problems at higher OPPs
>  on my
>  c630 laptop (sdm850)
> >>> Perhaps you won the sillicon lottery - iirc sdm850 is binned
> >>> for higher clocks as is out of the factory.
> >>>
> >>> Would it be best to drop the OPPs for all devices? Or just
> >>> those affected? I guess it's possible another c630 might
> >>> crash where yours doesn't?
> >>
> >> I've not heard any reports of similar issues from the handful of
> >> other
> >> folks with c630's on #aarch64-laptops.. but I can't really say
> >> if that
> >> is luck or not.
> > It looks like this affects at least the OnePlus 6 and PocoPhone
> > F1, I've done some more poking and the following diff
> > seems to fix the stability issues completely, it seems the delay
> > is required to let the update propagate.
> >
> > This doesn't feel like the right fix, but hopefully it's enough
> > to come up with a better solution than disabling the new
> > devfreq behaviour on a630.
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index d7cec7f0dde0..69e2a5e84dae 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu,
> > struct dev_pm_opp *opp)
> >  return;
> >  }
> >
> > +   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > +
> > +   usleep_range(300, 500);
> > +
> 
> >
> > I am a bit confused. We don't define a power domain for gpu in dt,
> > correct? Then what exactly set_opp do here? Do you think

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Caleb Connolly





On 09/08/2021 18:58, Rob Clark wrote:

On Mon, Aug 9, 2021 at 10:28 AM Akhil P Oommen  wrote:


On 8/9/2021 9:48 PM, Caleb Connolly wrote:



On 09/08/2021 17:12, Rob Clark wrote:

On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen 
wrote:


On 8/8/2021 10:22 PM, Rob Clark wrote:

On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly
 wrote:




On 07/08/2021 21:04, Rob Clark wrote:

On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
 wrote:


Hi Rob, Akhil,

On 29/07/2021 21:53, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
 wrote:




On 29/07/2021 21:24, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
 wrote:


Hi Rob,

I've done some more testing! It looks like before that patch
("drm/msm: Devfreq tuning") the GPU would never get above
the second frequency in the OPP table (342MHz) (at least, not
in glxgears). With the patch applied it would more
aggressively jump up to the max frequency which seems to be
unstable at the default regulator voltages.


*ohh*, yeah, ok, that would explain it


Hacking the pm8005 s1 regulator (which provides VDD_GFX) up
to 0.988v (instead of the stock 0.516v) makes the GPU stable
at the higher frequencies.

Applying this patch reverts the behaviour, and the GPU never
goes above 342MHz in glxgears, losing ~30% performance in
glxgear.

I think (?) that enabling CPR support would be the proper
solution to this - that would ensure that the regulators run
at the voltage the hardware needs to be stable.

Is hacking the voltage higher (although ideally not quite
that high) an acceptable short term solution until we have
CPR? Or would it be safer to just not make use of the higher
frequencies on a630 for now?



tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is
already
on CC and I added sboyd, maybe one of them knows better.

In the short term, removing the higher problematic OPPs from
dts might
be a better option than this patch (which I'm dropping), since
there
is nothing stopping other workloads from hitting higher OPPs.

Oh yeah that sounds like a more sensible workaround than mine .


I'm slightly curious why I didn't have problems at higher OPPs
on my
c630 laptop (sdm850)

Perhaps you won the sillicon lottery - iirc sdm850 is binned
for higher clocks as is out of the factory.

Would it be best to drop the OPPs for all devices? Or just
those affected? I guess it's possible another c630 might
crash where yours doesn't?


I've not heard any reports of similar issues from the handful of
other
folks with c630's on #aarch64-laptops.. but I can't really say
if that
is luck or not.

It looks like this affects at least the OnePlus 6 and PocoPhone
F1, I've done some more poking and the following diff
seems to fix the stability issues completely, it seems the delay
is required to let the update propagate.

This doesn't feel like the right fix, but hopefully it's enough
to come up with a better solution than disabling the new
devfreq behaviour on a630.

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index d7cec7f0dde0..69e2a5e84dae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu,
struct dev_pm_opp *opp)
 return;
 }

+   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+   usleep_range(300, 500);
+




I am a bit confused. We don't define a power domain for gpu in dt,
correct? Then what exactly set_opp do here? Do you think this usleep is
what is helping here somehow to mask the issue?

The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
the GPU DT. For the sake of simplicity I'll refer to the lowest
frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
the "min" state, and the highest frequency (71000) and OPP level
(RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
sdm845.dtsi under the gpu node.

The new devfreq behaviour unmasks what I think is a driver bug, it
inadvertently puts much more strain on the GPU regulators than they
usually get. With the new behaviour the GPU jumps from it's min state to
the max state and back again extremely rapidly under workloads as small
as refreshing UI. Where previously the GPU would rarely if ever go above
342MHz when interacting with the device, it now jumps between min and
max many times per second.

If my understanding is correct, the current implementation of the GMU
set freq is the following:
   - Get OPP for frequency to set
   - Push the frequency to the GMU - immediately updating the core clock
   - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
up somewhere in power management code and causes the gx regulator level
to be updated


Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else.
We were using a different api earlier which got deprecated -
dev_pm_opp_set_bw().

Huh ok, thanks for the correction. So it's the GMU wr

Re: [Intel-gfx] [PATCH 3/3] drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:03:28PM +0200, Daniel Vetter wrote:
> On Sun, Aug 08, 2021 at 11:07:57AM -0700, Matthew Brost wrote:
> > While debugging an issue with full GT resets I went down a rabbit hole
> > thinking the scrubbing of lost G2H wasn't working correctly. This proved
> > to be incorrect as this was working just fine but this chase inspired me
> > to write a selftest to prove that this works. This simple selftest
> > injects errors dropping various G2H and then issues a full GT reset
> > proving that the scrubbing of these G2H doesn't blow up.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  18 +++
> >  drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 126 ++
> >  .../drm/i915/selftests/i915_live_selftests.h  |   1 +
> >  .../i915/selftests/intel_scheduler_helpers.c  |  12 ++
> >  .../i915/selftests/intel_scheduler_helpers.h  |   2 +
> >  6 files changed, 163 insertions(+)
> >  create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index e54351a170e2..fec5ff7ef168 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -198,6 +198,10 @@ struct intel_context {
> >  */
> > u8 guc_prio;
> > u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> > +
> 
> I know the existing stuff isn't following this at all, but for anything
> new we really should put some kerneldoc into structures. This probably
> means you need to open-code the #ifdef here, since this macro will likely
> upset kerneldoc parsing.
> 

Ok, got it.

> > +   I915_SELFTEST_DECLARE(bool drop_schedule_enable);
> > +   I915_SELFTEST_DECLARE(bool drop_schedule_disable);
> > +   I915_SELFTEST_DECLARE(bool drop_deregister);
> >  };
> >  
> >  #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index cd8df078ca87..d13dc56bae43 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2618,6 +2618,11 @@ int intel_guc_deregister_done_process_msg(struct 
> > intel_guc *guc,
> >  
> > trace_intel_context_deregister_done(ce);
> >  
> > +   if (I915_SELFTEST_ONLY(ce->drop_deregister)) {
> > +   I915_SELFTEST_DECLARE(ce->drop_deregister = false;)
> 
> This macro wrapping is quite nasty, can't we just #ifdef this? Especially
> the _DECLARE name really doesn't expect a statement.
>

Had it like that originally then remember these marcos and in the past
people have complained when I didn't use them, so yes pretty much a
bikeshed. I personally like the ifdef myself.

Matt
 
> Aside from these bikesheds I don't have a much to say on the test logic
> itself, since I'm far from knowledgable on guc stuff ...
> -Daniel
> 
> 
> > +   return 0;
> > +   }
> > +
> > if (context_wait_for_deregister_to_register(ce)) {
> > struct intel_runtime_pm *runtime_pm =
> > &ce->engine->gt->i915->runtime_pm;
> > @@ -2672,10 +2677,19 @@ int intel_guc_sched_done_process_msg(struct 
> > intel_guc *guc,
> > trace_intel_context_sched_done(ce);
> >  
> > if (context_pending_enable(ce)) {
> > +   if (I915_SELFTEST_ONLY(ce->drop_schedule_enable)) {
> > +   I915_SELFTEST_DECLARE(ce->drop_schedule_enable = false;)
> > +   return 0;
> > +   }
> > clr_context_pending_enable(ce);
> > } else if (context_pending_disable(ce)) {
> > bool banned;
> >  
> > +   if (I915_SELFTEST_ONLY(ce->drop_schedule_disable)) {
> > +   I915_SELFTEST_DECLARE(ce->drop_schedule_disable = 
> > false;)
> > +   return 0;
> > +   }
> > +
> > /*
> >  * Unpin must be done before __guc_signal_context_fence,
> >  * otherwise a race exists between the requests getting
> > @@ -3047,3 +3061,7 @@ bool intel_guc_virtual_engine_has_heartbeat(const 
> > struct intel_engine_cs *ve)
> >  
> > return false;
> >  }
> > +
> > +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > +#include "selftest_guc.c"
> > +#endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c 
> > b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> > new file mode 100644
> > index ..46ca6554f65d
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> > @@ -0,0 +1,126 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright �� 2021 Intel Corporation
> > + */
> > +
> > +#include "selftests/intel_scheduler_helpers.h"
> > +
> > +static struct i915_request *nop_user_request(struct intel_context *ce,
> > +struct i915_request *from)
> > +{
> > +   struct i915_r

Re: [Intel-gfx] [PATCH 2/3] drm/i915/selftests: Fix memory corruption in live_lrc_isolation

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 03:38:38PM +0200, Daniel Vetter wrote:
> On Sun, Aug 08, 2021 at 11:07:56AM -0700, Matthew Brost wrote:
> > GuC submission has exposed an existing memory corruption in
> > live_lrc_isolation. We believe that some writes to the watchdog offsets
> > in the LRC (0x178 & 0x17c) can result in trashing of portions of the
> > address space. With GuC submission there are additional objects which
> > can move the context redzone into the space that is trashed. To
> > workaround this avoid poisoning the watchdog.
> 
> A Bspec reference here would be good (we can quote anything that's marked
> for public release, so doesn't have one of the IP markers).
>

Let me see what I dig up in the bspec.

BTW - Hopefully we can root cause this soon with a proper fix.
 
> Also I think the above should be replicated in condensed form instead of
> the XXX comment.
>

Sure.

Matt

> With those: Acked-by: Daniel Vetter  since I
> definitely have enough clue here for a detailed review.
> -Daniel
> 
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c | 29 +-
> >  1 file changed, 28 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
> > b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > index b0977a3b699b..6500e9fce8a0 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > @@ -1074,6 +1074,32 @@ record_registers(struct intel_context *ce,
> > goto err_after;
> >  }
> >  
> > +static u32 safe_offset(u32 offset, u32 reg)
> > +{
> > +   /* XXX skip testing of watchdog */
> > +   if (offset == 0x178 || offset == 0x17c)
> > +   reg = 0;
> > +
> > +   return reg;
> > +}
> > +
> > +static int get_offset_mask(struct intel_engine_cs *engine)
> > +{
> > +   if (GRAPHICS_VER(engine->i915) < 12)
> > +   return 0xfff;
> > +
> > +   switch (engine->class) {
> > +   default:
> > +   case RENDER_CLASS:
> > +   return 0x07ff;
> > +   case COPY_ENGINE_CLASS:
> > +   return 0x0fff;
> > +   case VIDEO_DECODE_CLASS:
> > +   case VIDEO_ENHANCEMENT_CLASS:
> > +   return 0x3fff;
> > +   }
> > +}
> > +
> >  static struct i915_vma *load_context(struct intel_context *ce, u32 poison)
> >  {
> > struct i915_vma *batch;
> > @@ -1117,7 +1143,8 @@ static struct i915_vma *load_context(struct 
> > intel_context *ce, u32 poison)
> > len = (len + 1) / 2;
> > *cs++ = MI_LOAD_REGISTER_IMM(len);
> > while (len--) {
> > -   *cs++ = hw[dw];
> > +   *cs++ = safe_offset(hw[dw] & 
> > get_offset_mask(ce->engine),
> > +   hw[dw]);
> > *cs++ = poison;
> > dw += 2;
> > }
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH 1/3] drm/i915/guc: Fix several issues related to resets / request cancelation

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 03:35:26PM +0200, Daniel Vetter wrote:
> On Sun, Aug 08, 2021 at 11:07:55AM -0700, Matthew Brost wrote:
> > Resets are notoriously hard to get fully working and notoriously racey,
> > especially with selftests / IGTs that do all sorts of wild things that
> > would be near impossible to hit during normal use cases. Even though
> > likely impossible to hit, anything selftests / IGTs uncover needs to be
> > fixed. This patch addresses 7 such issues.
> > 
> > 1. A small race that could result in incorrect accounting of the number
> > of outstanding G2H. Basically prior to this patch we did not increment
> > the number of outstanding G2H if we encoutered a GT reset while sending
> > a H2G. This was incorrect as the context state had already been updated
> > to anticipate a G2H response thus the counter should be incremented.
> > 
> > 2. When unwinding requests on a reset context, if other requests in the
> > context are in the priority list the requests could be resubmitted out
> > of seqno order. Traverse the list of active requests in reserve and
> > append to the head of the priority list to fix this.
> > 
> > 3. Don't drop ce->guc_active.lock when unwinding a context after reset.
> > At one point we had to drop this because of a lock inversion but that is
> > no longer the case. It is much safer to hold the lock so let's do that.
> > 
> > 4. Prior to this patch the blocked context counter was cleared on
> > init_sched_state (used during registering a context & resets) which is
> > incorrect. This state needs to be persistent or the counter can read the
> > incorrect value.
> > 
> > 5. Flush the work queue for GuC generated G2H messages during a GT reset.
> > 
> > 6. Do not clear enable during a context reset if a schedule enable is in
> > flight.
> > 
> > 7. When unblocking a context, do not enable scheduling if the context is
> > banned.
> 
> I think each of the above should be a separate patch. I think it would
> also be good if each fix references the commits that introduced/changed
> something.
>

Sure, just was trying to cheat and make our lives easier with less
patches to backport into DII.
 
> Most of this stuff is extremely hard to get right, and unfortunately our
> current code is way too fond of lockless trickery (which really isn't a
> great idea in the reset code). We need to apply as much care as possible
> here.
>

Yep, resets are hard. It is hard because like ten other async things
(e.g. a new submission, registering a context, banning a context,
canceling a request, processing a G2H, trying to idle the GPU, unpinning
a context) can all be happening at the same time. Hopefully when we move
the DRM scheduler we can remove some of these async operations,
perma-pinned contexts would also help too. Have a story for that + a
story to simplify the locking.

> Also expect me to ask a lot of annoying questions about all the atomic_t
> you touch :-)

Looking forward to it.

Matt

> -Daniel
> 
> 
> > 
> > Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in 
> > buffer")
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC 
> > interface")
> > Signed-off-by: Matthew Brost 
> > Cc: 
> > ---
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 ---
> >  1 file changed, 27 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 87d8dc8f51b9..cd8df078ca87 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -152,7 +152,7 @@ static inline void init_sched_state(struct 
> > intel_context *ce)
> >  {
> > /* Only should be called from guc_lrc_desc_pin() */
> > atomic_set(&ce->guc_sched_state_no_lock, 0);
> > -   ce->guc_state.sched_state = 0;
> > +   ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK;
> >  }
> >  
> >  static inline bool
> > @@ -360,11 +360,13 @@ static int guc_submission_send_busy_loop(struct 
> > intel_guc *guc,
> >  {
> > int err;
> >  
> > -   err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > -
> > -   if (!err && g2h_len_dw)
> > +   if (g2h_len_dw)
> > atomic_inc(&guc->outstanding_submission_g2h);
> >  
> > +   err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > +   if (err == -EBUSY && g2h_len_dw)
> > +   atomic_dec(&guc->outstanding_submission_g2h);
> > +
> > return err;
> >  }
> >  
> > @@ -725,6 +727,11 @@ void intel_guc_submission_reset_prepare(struct 
> > intel_guc *guc)
> > wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > } while (!list_empty(&guc->ct.requests.incoming));
> > }
> > +
> > +   /* Flush any GuC generated G2H */
> > +   while (!list_empty(&guc->ct.requests.incoming))
> > +   msleep(1);
> > +
> >

Re: [Intel-gfx] [PATCH 46/46] drm/i915/guc: Add delay before disabling scheduling on contexts

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 07:17:27PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:43PM -0700, Matthew Brost wrote:
> > Some workloads use lots of contexts that continually pin / unpin
> > contexts. With GuC submission an unpin translates to a schedule disable
> > H2G which puts pressure on both the i915 and GuC. A schedule disable can
> > also block future requests from being submitted until the operation
> > completes. None of this is ideal.
> > 
> > Add a configurable, via debugfs, delay period before the schedule
> > disable is issued. Default delay period is 1 second. The delay period is
> > skipped if more than 3/4 of the guc_ids are in use.
> > 
> > This patch also updates the selftests to turn off this delay period as
> > this extra time would likely cause many selftests to fail. Follow up
> > patches will fix all the selftests and enable the delay period.
> > 
> > Signed-off-by: Matthew Brost 
> 
> I think this is more evidence that we should just pin/unpin context at
> create/destruction time. The current scheme doesn't really work that well
> and causes way more pain than benefits it seems.
> 

Well that choice is above my pay grade, but for what it is worth it
would simplify the GuC backend quite a bit if we perma-pin contexts. By
quite a bit, I actually mean a lot of complexity goes away.

In the meantime I think we probably need this code though to avoid
trashes on the scheduling enable / disable.

Matt

> If anyone screams, and that's a big if aside of some igts, we can come up
> with a proper scheme to evict contexts without pin/unpin and layer hacks
> over that misdesign.
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
> >  .../i915/gem/selftests/i915_gem_coherency.c   |   2 +-
> >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |   2 +-
> >  .../drm/i915/gem/selftests/i915_gem_mman.c|   2 +-
> >  .../drm/i915/gem/selftests/i915_gem_object.c  |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
> >  drivers/gpu/drm/i915/gt/intel_context.h   |   9 +
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   7 +
> >  .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c|  28 ++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 322 +-
> >  .../i915/gt/uc/selftest_guc_flow_control.c|  19 +-
> >  drivers/gpu/drm/i915/i915_selftest.h  |   2 +
> >  drivers/gpu/drm/i915/i915_trace.h |  10 +
> >  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   2 +-
> >  drivers/gpu/drm/i915/selftests/i915_perf.c|   2 +-
> >  drivers/gpu/drm/i915/selftests/i915_request.c |   2 +-
> >  drivers/gpu/drm/i915/selftests/i915_vma.c |   2 +-
> >  18 files changed, 405 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index b199d59bd2c4..1553287e5491 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1298,7 +1298,7 @@ static void engines_idle_release(struct 
> > i915_gem_context *ctx,
> > int err;
> >  
> > /* serialises with execbuf */
> > -   set_bit(CONTEXT_CLOSED_BIT, &ce->flags);
> > +   intel_context_close(ce);
> > if (!intel_context_pin_if_active(ce))
> > continue;
> >  
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> > index 13b088cc787e..a666d7e610f5 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> > @@ -434,5 +434,5 @@ int i915_gem_coherency_live_selftests(struct 
> > drm_i915_private *i915)
> > SUBTEST(igt_gem_coherency),
> > };
> >  
> > -   return i915_subtests(tests, i915);
> > +   return i915_live_subtests(tests, i915);
> >  }
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> > index ffae7df5e4d7..2c92afa9d608 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> > @@ -474,5 +474,5 @@ int i915_gem_dmabuf_live_selftests(struct 
> > drm_i915_private *i915)
> > SUBTEST(igt_dmabuf_import_same_driver_lmem_smem),
> > };
> >  
> > -   return i915_subtests(tests, i915);
> > +   return i915_live_subtests(tests, i915);
> >  }
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > index b20f5621f62b..4745c78a48de 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> > @@ -1414,5 +1414,5 @@ int i915_gem_mman_live_selftests(struct 
> > drm_i915_private *i915)
> > SUBTEST(igt_mmap_gpu),

Re: [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 06:36:44PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> > Display the workqueue status in debugfs for GuC contexts that are in
> > parent-child relationship.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +--
> >  1 file changed, 39 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 30df1c8db491..44a7582c9aed 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct 
> > intel_guc *guc,
> > gse_log_submission_info(guc->gse[i], p, i);
> >  }
> >  
> > +static inline void guc_log_context(struct drm_printer *p,
> > +  struct intel_context *ce)
> > +{
> > +   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > +   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > +   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > +  ce->ring->head,
> > +  ce->lrc_reg_state[CTX_RING_HEAD]);
> > +   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > +  ce->ring->tail,
> > +  ce->lrc_reg_state[CTX_RING_TAIL]);
> > +   drm_printf(p, "\t\tContext Pin Count: %u\n",
> > +  atomic_read(&ce->pin_count));
> > +   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > +  atomic_read(&ce->guc_id_ref));
> > +   drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > +  atomic_read(&ce->guc_num_rq_not_ready));
> > +   drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > +  ce->guc_state.sched_state,
> > +  atomic_read(&ce->guc_sched_state_no_lock));
> 
> It's all debugfs, but I think proper locking even there is good. It at
> least reduces the confusion when the locking scheme is largely
> undocumented. Also given how much we have rcu for everything would be good
> to double-check all pointer dererences are properly protected.
>

Not sure if I 100% follow this but I don't think any of the pointers
dref here are RCU protected. Certainly none of the GuC ones are.

Will double before the next respin though.

> > +}
> > +
> >  void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >  struct drm_printer *p)
> >  {
> > struct intel_context *ce;
> > unsigned long index;
> > xa_for_each(&guc->context_lookup, index, ce) {
> 
> xa_for_each doesn't provide any guarantees, so doesn't protect against
> concurrent removeal or anything like that. We need to do better than that.

https://elixir.bootlin.com/linux/latest/source/include/linux/xarray.h#L498
'It is safe to modify the array during the iteration.'

Matt

> -Daniel
> 
> > -   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> > -   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> > -   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> > -  ce->ring->head,
> > -  ce->lrc_reg_state[CTX_RING_HEAD]);
> > -   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> > -  ce->ring->tail,
> > -  ce->lrc_reg_state[CTX_RING_TAIL]);
> > -   drm_printf(p, "\t\tContext Pin Count: %u\n",
> > -  atomic_read(&ce->pin_count));
> > -   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> > -  atomic_read(&ce->guc_id_ref));
> > -   drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> > -  atomic_read(&ce->guc_num_rq_not_ready));
> > -   drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> > -  ce->guc_state.sched_state,
> > -  atomic_read(&ce->guc_sched_state_no_lock));
> > +   GEM_BUG_ON(intel_context_is_child(ce));
> >  
> > +   guc_log_context(p, ce);
> > guc_log_context_priority(p, ce);
> > +
> > +   if (intel_context_is_parent(ce)) {
> > +   struct guc_process_desc *desc = __get_process_desc(ce);
> > +   struct intel_context *child;
> > +
> > +   drm_printf(p, "\t\tWQI Head: %u\n",
> > +  READ_ONCE(desc->head));
> > +   drm_printf(p, "\t\tWQI Tail: %u\n",
> > +  READ_ONCE(desc->tail));
> > +   drm_printf(p, "\t\tWQI Status: %u\n\n",
> > +  READ_ONCE(desc->wq_status));
> > +
> > +   for_each_child(ce, child)
> > +   guc_log_context(p, child);
> > +   }
> > }
> >  }
> >  
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporat

Re: [Intel-gfx] [PATCH 21/46] drm/i915/guc: Add guc_child_context_destroy

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 05:36:12PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:18PM -0700, Matthew Brost wrote:
> > Since child contexts do not own the guc_ids or GuC context registration,
> > child contexts can simply be freed on destroy. Add
> > guc_child_context_destroy context operation to do this.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 2d8296bcc583..850edeff9230 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2828,6 +2828,13 @@ static void destroy_worker_func(struct work_struct 
> > *w)
> > intel_gt_pm_unpark_work_add(gt, destroy_worker);
> >  }
> >  
> > +/* Future patches will use this function */
> > +__maybe_unused
> 
> Pure bikeshed, but for something this small just squash it in with the
> first user. This kinda does nothing alone.
> -Daniel
> 

Sure.

Matt

> > +static void guc_child_context_destroy(struct kref *kref)
> > +{
> > +   __guc_context_destroy(container_of(kref, struct intel_context, ref));
> > +}
> > +
> >  static void guc_context_destroy(struct kref *kref)
> >  {
> > struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH 20/46] drm/i915/guc: Add hang check to GuC submit engine

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 05:35:25PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:17PM -0700, Matthew Brost wrote:
> > The heartbeat uses a single instance of a GuC submit engine (GSE) to do
> > the hang check. As such if a different GSE's state machine hangs, the
> > heartbeat cannot detect this hang. Add timer to each GSE which in turn
> > can disable all submissions if it is hung.
> > 
> > Cc: John Harrison 
> > Signed-off-by: Matthew Brost 
> > ---
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++
> >  .../i915/gt/uc/intel_guc_submission_types.h   |  3 ++
> >  2 files changed, 39 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index afb9b4bb8971..2d8296bcc583 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -105,15 +105,21 @@ static bool tasklet_blocked(struct guc_submit_engine 
> > *gse)
> > return test_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
> >  }
> >  
> > +/* 2 seconds seems like a reasonable timeout waiting for a G2H */
> > +#define MAX_TASKLET_BLOCKED_NS 20
> >  static void set_tasklet_blocked(struct guc_submit_engine *gse)
> >  {
> > lockdep_assert_held(&gse->sched_engine.lock);
> > +   hrtimer_start_range_ns(&gse->hang_timer,
> > +  ns_to_ktime(MAX_TASKLET_BLOCKED_NS), 0,
> > +  HRTIMER_MODE_REL_PINNED);
> > set_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
> 
> So with drm/scheduler the reset handling is assumed to be
> single-threaded, and there's quite complex rules around that. I've
> recently worked with Boris Brezillion to clarify all this a bit and
> improve docs. Does this all still work in that glorious future? Might be
> good to at least sprinkle some comments/thoughts around in the commit
> message about the envisaged future direction for all this stuff, to keep
> people in the loop. Especially future people.
> 
> Ofc plan is still to just largely land all this.
> 
> Also: set_bit is an unordered atomic, which means you need barriers, which
> meanes ... *insert the full rant about justifying/documenting lockless
> algorithms from earlier *
>

lockdep_assert_held(&gse->sched_engine.lock);

Not lockless. Also spin locks act as barriers, right?
 
> But I think this all falls out with the removal of the guc-id allocation
> scheme?

Yes, this patch is getting deleted.

Matt

> -Daniel
> 
> >  }
> >  
> >  static void __clr_tasklet_blocked(struct guc_submit_engine *gse)
> >  {
> > lockdep_assert_held(&gse->sched_engine.lock);
> > +   hrtimer_cancel(&gse->hang_timer);
> > clear_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
> >  }
> >  
> > @@ -1028,6 +1034,7 @@ static void disable_submission(struct intel_guc *guc)
> > if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > GEM_BUG_ON(!guc->ct.enabled);
> > __tasklet_disable_sync_once(&sched_engine->tasklet);
> > +   hrtimer_try_to_cancel(&guc->gse[i]->hang_timer);
> > sched_engine->tasklet.callback = NULL;
> > }
> > }
> > @@ -3750,6 +3757,33 @@ static void guc_sched_engine_destroy(struct kref 
> > *kref)
> > kfree(gse);
> >  }
> >  
> > +static enum hrtimer_restart gse_hang(struct hrtimer *hrtimer)
> > +{
> > +   struct guc_submit_engine *gse =
> > +   container_of(hrtimer, struct guc_submit_engine, hang_timer);
> > +   struct intel_guc *guc = gse->sched_engine.private_data;
> > +
> > +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > +   if (guc->gse_hang_expected)
> > +   drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +   "GSE[%i] hung, disabling submission", gse->id);
> > +   else
> > +   drm_err(&guc_to_gt(guc)->i915->drm,
> > +   "GSE[%i] hung, disabling submission", gse->id);
> > +#else
> > +   drm_err(&guc_to_gt(guc)->i915->drm,
> > +   "GSE[%i] hung, disabling submission", gse->id);
> > +#endif
> > +
> > +   /*
> > +* Tasklet not making forward progress, disable submission which in turn
> > +* will kick in the heartbeat to do a full GPU reset.
> > +*/
> > +   disable_submission(guc);
> > +
> > +   return HRTIMER_NORESTART;
> > +}
> > +
> >  static void guc_submit_engine_init(struct intel_guc *guc,
> >struct guc_submit_engine *gse,
> >int id)
> > @@ -3767,6 +3801,8 @@ static void guc_submit_engine_init(struct intel_guc 
> > *guc,
> > sched_engine->retire_inflight_request_prio =
> > guc_retire_inflight_request_prio;
> > sched_engine->private_data = guc;
> > +   hrtimer_init(&gse->hang_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > +   gse->hang_timer.function = gse_hang;
> > gse->id = id;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h 
> >

Re: [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 05:31:38PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:16PM -0700, Matthew Brost wrote:
> > Assign contexts in parent-child relationship consecutive guc_ids. This
> > is accomplished by partitioning guc_id space between ones that need to
> > be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
> > available guc_ids). The consecutive search is implemented via the bitmap
> > API.
> > 
> > This is a precursor to the full GuC multi-lrc implementation but aligns
> > to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
> > when using the GuC multi-lrc interface.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.h   |   6 +
> >  drivers/gpu/drm/i915/gt/intel_reset.c |   3 +-
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   7 +-
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 222 --
> >  .../i915/gt/uc/intel_guc_submission_types.h   |  10 +
> >  5 files changed, 179 insertions(+), 69 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index c208691fc87d..7ce3b3d2edb7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -54,6 +54,12 @@ static inline bool intel_context_is_parent(struct 
> > intel_context *ce)
> > return !!ce->guc_number_children;
> >  }
> >  
> > +static inline struct intel_context *
> > +intel_context_to_parent(struct intel_context *ce)
> > +{
> > +   return intel_context_is_child(ce) ? ce->parent : ce;
> > +}
> > +
> >  void intel_context_bind_parent_child(struct intel_context *parent,
> >  struct intel_context *child);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
> > b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index ea763138197f..c3d4baa1b2b8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -849,6 +849,7 @@ static void reset_finish(struct intel_gt *gt, 
> > intel_engine_mask_t awake)
> >  
> >  static void nop_submit_request(struct i915_request *request)
> >  {
> > +   struct intel_context *ce = intel_context_to_parent(request->context);
> > RQ_TRACE(request, "-EIO\n");
> >  
> > /*
> > @@ -857,7 +858,7 @@ static void nop_submit_request(struct i915_request 
> > *request)
> >  * this for now.
> >  */
> > if (intel_engine_uses_guc(request->engine))
> > -   intel_guc_decr_num_rq_not_ready(request->context);
> > +   intel_guc_decr_num_rq_not_ready(ce);
> >  
> > request = i915_request_mark_eio(request);
> > if (request) {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index c0c60ccabfa4..30a0f364db8f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -24,6 +24,7 @@ struct __guc_ads_blob;
> >  
> >  enum {
> > GUC_SUBMIT_ENGINE_SINGLE_LRC,
> > +   GUC_SUBMIT_ENGINE_MULTI_LRC,
> > GUC_SUBMIT_ENGINE_MAX
> >  };
> >  
> > @@ -59,8 +60,10 @@ struct intel_guc {
> > struct ida guc_ids;
> > u32 num_guc_ids;
> > u32 max_guc_ids;
> > -   struct list_head guc_id_list_no_ref;
> > -   struct list_head guc_id_list_unpinned;
> > +   unsigned long *guc_ids_bitmap;
> > +#define MAX_GUC_ID_ORDER   (order_base_2(MAX_ENGINE_INSTANCE + 1))
> > +   struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
> > +   struct list_head guc_id_list_unpinned[MAX_GUC_ID_ORDER + 1];
> 
> Random new global lists definitely need kerneldoc about what is on them,
> how they're linked, what their lifetime rules are and what locks we're
> holding.
> 
> Leaving this all to reviews to figure out, and worse, future readers of
> your code, is not kind.
>

Got it.
 
> > spinlock_t destroy_lock;/* protects list / worker */
> > struct list_head destroyed_contexts;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index f23dd716723f..afb9b4bb8971 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -169,6 +169,15 @@ static void clr_guc_ids_exhausted(struct 
> > guc_submit_engine *gse)
> > clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
> >  }
> >  
> > +/*
> > + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be 
> > contiguous
> 
> I think it'd be good to put down the reason here for why. Is this a
> requirement of the guc interface, or just an artifact of our current
> implementation? In the latter case also explain what exactly the
> contstraint is (but honestly I can't think of much reasons for that)

Multi-lrc guc_ids need to be sequential between the parent and children
- this is a requirement of the GuC submission interface. Can explicitly
state that here.

Matt

Re: [PATCH v2 0/9] PCI/VGA: Rework default VGA device selection

2021-08-09 Thread Bjorn Helgaas

On Tue, Aug 03, 2021 at 12:06:44PM -0500, Bjorn Helgaas wrote:
> On Sat, Jul 24, 2021 at 05:30:02PM +0800, Huacai Chen wrote:
> > Hi, Bjorn,
> > 
> > On Sat, Jul 24, 2021 at 8:10 AM Bjorn Helgaas  wrote:
> > >
> > > On Fri, Jul 23, 2021 at 05:53:36PM +0800, Huacai Chen wrote:
> > > > Hi, Bjorn,
> > > >
> > > > On Fri, Jul 23, 2021 at 5:29 AM Bjorn Helgaas  
> > > > wrote:
> > > > >
> > > > > From: Bjorn Helgaas 
> > > > >
> > > > > This is a little bit of rework and extension of Huacai's nice work at 
> > > > > [1].
> > > > >
> > > > > It moves the VGA arbiter to the PCI subsystem, fixes a few nits, and 
> > > > > breaks
> > > > > a few pieces off Huacai's patch to make the main patch a little 
> > > > > smaller.
> > > > >
> > > > > That last patch is still not very small, and it needs a commit log, 
> > > > > as I
> > > > > mentioned at [2].
> > > > >
> > > > > All comments welcome!
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/dri-devel/20210705100503.1120643-1-chenhua...@loongson.cn/
> > > > > [2] 
> > > > > https://lore.kernel.org/r/20210720221923.GA43331@bjorn-Precision-5520
> > > > Thank you for your splitting. Your two questions are answered in the 
> > > > following.
> > > >
> > > > (1) explain why your initcall ordering is unusual.
> > > > The original problem happens on MIPS. vga_arb_device_init() and
> > > > pcibios_init() are both wrapped by subsys_initcall(). The order of
> > > > functions in the same level depends on the Makefile.
> > > >
> > > > TOP level Makefile:
> > > > drivers-y   := drivers/ sound/
> > > > 
> > > > include arch/$(SRCARCH)/Makefile
> > > >
> > > > drivers/Makefile:
> > > > obj-$(CONFIG_ACPI)  += acpi/
> > > > 
> > > > obj-y   += gpu/
> > > >
> > > > arch/mips/Makefile:
> > > > drivers-$(CONFIG_PCI)   += arch/mips/pci/
> > > >
> > > > This makes pcibios_init() in arch/mips/pci/ placed after
> > > > vga_arb_device_init() in drivers/gpu. ACPI-based systems have no
> > > > problems because acpi_init() in drivers/acpi is placed before
> > > > vga_arb_device_init().
> > >
> > > Thanks for the above; that was helpful.  To summarize:
> > >
> > >   - On your system, the AST2500 bridge [1a03:1150] does not implement
> > > PCI_BRIDGE_CTL_VGA [1].  This is perfectly legal but means the
> > > legacy VGA resources won't reach downstream devices unless they're
> > > included in the usual bridge windows.
> > >
> > >   - vga_arb_select_default_device() will set a device below such a
> > > bridge as the default VGA device as long as it has PCI_COMMAND_IO
> > > and PCI_COMMAND_MEMORY enabled.
> > >
> > >   - vga_arbiter_add_pci_device() is called for every VGA device,
> > > either at boot-time or at hot-add time, and it will also set the
> > > device as the default VGA device, but ONLY if all bridges leading
> > > to it implement PCI_BRIDGE_CTL_VGA.
> > >
> > >   - This difference between vga_arb_select_default_device() and
> > > vga_arbiter_add_pci_device() means that a device below an AST2500
> > > or similar bridge can only be set as the default if it is
> > > enumerated before vga_arb_device_init().
> > >
> > >   - On ACPI-based systems, PCI devices are enumerated by acpi_init(),
> > > which runs before vga_arb_device_init().
> > >
> > >   - On non-ACPI systems, like your MIPS system, they are enumerated by
> > > pcibios_init(), which typically runs *after*
> > > vga_arb_device_init().
> > >
> > > So I think the critical change is actually that you made
> > > vga_arb_update_default_device(), which you call from
> > > vga_arbiter_add_pci_device(), set the default device even if it does
> > > not own the VGA resources because an upstream bridge doesn't implement
> > > PCI_BRIDGE_CTL_VGA, i.e.,
> > >
> > >   (vgadev->owns & VGA_RSRC_LEGACY_MASK) != VGA_RSRC_LEGACY_MASK
> > >
> > > Does that seem right?
> >
> > Yes, that's right.
> 
> I think that means I screwed up.  I somehow had it in my head that the
> hot-add path would never set the default VGA device.  But that is
> false.
> 
> I still think we should move vgaarb.c to drivers/pci/ and get it more
> tightly integrated into the PCI core.
> 
> BUT that's a lot of churn and obscures the simple change that fixes
> the problem for you.  So I think the first step should be the change
> to vga_arb_update_default_device() so it sets the default device even
> when the upstream bridge doesn't implement PCI_BRIDGE_CTL_VGA.
> 
> That should be a relatively small change, and I think it's better to
> make the fix before embarking on major restructuring.

To make sure this doesn't get lost: I'm hoping you can separate out
and post the small patch to vga_arb_update_default_device().

I can look at the move/restructure stuff later.

> > > [1] 
> > > https://lore.kernel.org/r/caahv-h4pn53xc7qvvwm792ppkqrnjwppdwmrhbv8twgqu0e...@mail.gmail.com
> > >
> > > > (2) explain the approach, which IIUC is basically to add the
> > > > vga_a

Re: [Intel-gfx] [PATCH 16/46] drm/i915/guc: Implement GuC parent-child context pin / unpin functions

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 05:17:34PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:13PM -0700, Matthew Brost wrote:
> > Implement GuC parent-child context pin / unpin functions in which in any
> > contexts in the relationship are pinned all the contexts are pinned. The
> > parent owns most of the pinning / unpinning process and the children
> > direct any pins / unpins to the parent.
> > 
> > Patch implements a number of unused functions that will be connected
> > later in the series.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c   | 187 --
> >  drivers/gpu/drm/i915/gt/intel_context.h   |  43 +---
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  |  25 ++-
> >  drivers/gpu/drm/i915/gt/intel_lrc.c   |  26 +--
> >  drivers/gpu/drm/i915/gt/intel_lrc.h   |   6 +-
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |   5 +-
> >  drivers/gpu/drm/i915/gt/mock_engine.c |   4 +-
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 183 +++--
> >  9 files changed, 371 insertions(+), 112 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 8cb92b10b547..bb4c14656067 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -158,8 +158,8 @@ static void __ring_retire(struct intel_ring *ring)
> > intel_ring_unpin(ring);
> >  }
> >  
> > -static int intel_context_pre_pin(struct intel_context *ce,
> > -struct i915_gem_ww_ctx *ww)
> > +static int __intel_context_pre_pin(struct intel_context *ce,
> > +  struct i915_gem_ww_ctx *ww)
> >  {
> > int err;
> >  
> > @@ -190,7 +190,7 @@ static int intel_context_pre_pin(struct intel_context 
> > *ce,
> > return err;
> >  }
> >  
> > -static void intel_context_post_unpin(struct intel_context *ce)
> > +static void __intel_context_post_unpin(struct intel_context *ce)
> >  {
> > if (ce->state)
> > __context_unpin_state(ce->state);
> > @@ -199,13 +199,85 @@ static void intel_context_post_unpin(struct 
> > intel_context *ce)
> > __ring_retire(ce->ring);
> >  }
> >  
> > -int __intel_context_do_pin_ww(struct intel_context *ce,
> > - struct i915_gem_ww_ctx *ww)
> > +static int intel_context_pre_pin(struct intel_context *ce,
> > +struct i915_gem_ww_ctx *ww)
> >  {
> > -   bool handoff = false;
> > -   void *vaddr;
> > +   struct intel_context *child;
> > +   int err, i = 0;
> > +
> > +   GEM_BUG_ON(intel_context_is_child(ce));
> > +
> > +   for_each_child(ce, child) {
> > +   err = __intel_context_pre_pin(child, ww);
> > +   if (unlikely(err))
> > +   goto unwind;
> > +   ++i;
> > +   }
> > +
> > +   err = __intel_context_pre_pin(ce, ww);
> > +   if (unlikely(err))
> > +   goto unwind;
> > +
> > +   return 0;
> > +
> > +unwind:
> > +   for_each_child(ce, child) {
> > +   if (!i--)
> > +   break;
> > +   __intel_context_post_unpin(ce);
> > +   }
> > +
> > +   return err;
> > +}
> > +
> > +static void intel_context_post_unpin(struct intel_context *ce)
> > +{
> > +   struct intel_context *child;
> > +
> > +   GEM_BUG_ON(intel_context_is_child(ce));
> > +
> > +   for_each_child(ce, child)
> > +   __intel_context_post_unpin(child);
> > +
> > +   __intel_context_post_unpin(ce);
> > +}
> > +
> > +static int __do_ww_lock(struct intel_context *ce,
> > +   struct i915_gem_ww_ctx *ww)
> > +{
> > +   int err = i915_gem_object_lock(ce->timeline->hwsp_ggtt->obj, ww);
> > +
> > +   if (!err && ce->ring->vma->obj)
> > +   err = i915_gem_object_lock(ce->ring->vma->obj, ww);
> > +   if (!err && ce->state)
> > +   err = i915_gem_object_lock(ce->state->obj, ww);
> > +
> > +   return err;
> > +}
> > +
> > +static int do_ww_lock(struct intel_context *ce,
> > + struct i915_gem_ww_ctx *ww)
> > +{
> > +   struct intel_context *child;
> > int err = 0;
> >  
> > +   GEM_BUG_ON(intel_context_is_child(ce));
> > +
> > +   for_each_child(ce, child) {
> > +   err = __do_ww_lock(child, ww);
> > +   if (unlikely(err))
> > +   return err;
> > +   }
> > +
> > +   return __do_ww_lock(ce, ww);
> > +}
> > +
> > +static int __intel_context_do_pin_ww(struct intel_context *ce,
> > +struct i915_gem_ww_ctx *ww)
> > +{
> > +   bool handoff = false;
> > +   int err;
> > +
> > if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
> > err = intel_context_alloc_state(ce);
> > if (err)
> > @@ -217,14 +289,11 @@ int __intel_context_do_pin_ww(struct intel_context 
> > *ce,
> >  * refcount for __intel_context_active(), which prevent a lock
> >  * inversion of

[PATCH v2 1/2] dt-bindings: add bindings for the Sharp LS060T1SX01 panel

2021-08-09 Thread Dmitry Baryshkov

Add devicetree bindings for the Sharp LS060T1SX01 6.0" FullHD panel
using NT35695 driver. This panel can be found i.e. in the Dragonboard
Display Adapter bundle.

Signed-off-by: Dmitry Baryshkov 
---
 .../display/panel/sharp,ls060t1sx01.yaml  | 51 +++
 1 file changed, 51 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml

diff --git 
a/Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml 
b/Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml
new file mode 100644
index ..c4af5e7f6f39
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/panel/sharp,ls060t1sx01.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sharp Microelectronics 6.0" FullHD TFT LCD panel
+
+maintainers:
+  - Dmitry Baryskov 
+
+allOf:
+  - $ref: panel-common.yaml#
+
+properties:
+  compatible:
+const: sharp,ls060t1sx01
+
+  reg: true
+  backlight: true
+  reset-gpios: true
+  port: true
+
+  avdd-supply:
+description: handle of the regulator that provides the supply voltage
+
+required:
+  - compatible
+  - reg
+  - avdd-supply
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+
+dsi {
+#address-cells = <1>;
+#size-cells = <0>;
+
+panel@0 {
+compatible = "sharp,ls060t1sx01";
+reg = <0>;
+avdd-supply = <&pm8941_l22>;
+backlight = <&backlight>;
+reset-gpios = <&pm8916_gpios 25 GPIO_ACTIVE_LOW>;
+};
+};
+
+...
-- 
2.30.2

[PATCH v2 2/2] drm/panel: Add support for Sharp LS060T1SX01 panel

2021-08-09 Thread Dmitry Baryshkov

Add driver to support Sharp LS06T1SX01 FullHD panel. The panel uses
nt35695 driver IC. For example this LCD module can be found in the
kwaek.ca Dragonboard Display Adapter Bundle.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/panel/Kconfig |  10 +
 drivers/gpu/drm/panel/Makefile|   1 +
 .../gpu/drm/panel/panel-sharp-ls060t1sx01.c   | 274 ++
 3 files changed, 285 insertions(+)
 create mode 100644 drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c

diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index 4894913936e9..08f85a5ff738 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -451,6 +451,16 @@ config DRM_PANEL_SHARP_LS043T1LE01
  Say Y here if you want to enable support for Sharp LS043T1LE01 qHD
  (540x960) DSI panel as found on the Qualcomm APQ8074 Dragonboard
 
+config DRM_PANEL_SHARP_LS060T1SX01
+   tristate "Sharp LS060T1SX01 FullHD video mode panel"
+   depends on OF
+   depends on DRM_MIPI_DSI
+   depends on BACKLIGHT_CLASS_DEVICE
+   help
+ Say Y here if you want to enable support for Sharp LS060T1SX01 6.0"
+ FullHD (1080x1920) DSI panel as found in Dragonboard Display Adapter
+ Bundle.
+
 config DRM_PANEL_SITRONIX_ST7701
tristate "Sitronix ST7701 panel driver"
depends on OF
diff --git a/drivers/gpu/drm/panel/Makefile b/drivers/gpu/drm/panel/Makefile
index cae4d976c069..7dd6bd755e13 100644
--- a/drivers/gpu/drm/panel/Makefile
+++ b/drivers/gpu/drm/panel/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_DRM_PANEL_SEIKO_43WVF1G) += panel-seiko-43wvf1g.o
 obj-$(CONFIG_DRM_PANEL_SHARP_LQ101R1SX01) += panel-sharp-lq101r1sx01.o
 obj-$(CONFIG_DRM_PANEL_SHARP_LS037V7DW01) += panel-sharp-ls037v7dw01.o
 obj-$(CONFIG_DRM_PANEL_SHARP_LS043T1LE01) += panel-sharp-ls043t1le01.o
+obj-$(CONFIG_DRM_PANEL_SHARP_LS060T1SX01) += panel-sharp-ls060t1sx01.o
 obj-$(CONFIG_DRM_PANEL_SITRONIX_ST7701) += panel-sitronix-st7701.o
 obj-$(CONFIG_DRM_PANEL_SITRONIX_ST7703) += panel-sitronix-st7703.o
 obj-$(CONFIG_DRM_PANEL_SITRONIX_ST7789V) += panel-sitronix-st7789v.o
diff --git a/drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c 
b/drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c
new file mode 100644
index ..4fece00e6156
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Linaro Ltd.
+// Generated with linux-mdss-dsi-panel-driver-generator from vendor device 
tree:
+//   Copyright (c) 2013-2014, The Linux Foundation. All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+struct sharp_ls060 {
+   struct drm_panel panel;
+   struct mipi_dsi_device *dsi;
+   struct regulator *supply;
+   struct gpio_desc *reset_gpio;
+   bool prepared;
+};
+
+static inline struct sharp_ls060 *to_sharp_ls060(struct drm_panel *panel)
+{
+   return container_of(panel, struct sharp_ls060, panel);
+}
+
+#define dsi_dcs_write_seq(dsi, seq...) ({  \
+   static const u8 d[] = { seq };  \
+   \
+   mipi_dsi_dcs_write_buffer(dsi, d, ARRAY_SIZE(d));   \
+   })
+
+static void sharp_ls060_reset(struct sharp_ls060 *ctx)
+{
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(1, 11000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 1);
+   usleep_range(1, 11000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(1, 11000);
+}
+
+static int sharp_ls060_on(struct sharp_ls060 *ctx)
+{
+   struct mipi_dsi_device *dsi = ctx->dsi;
+   struct device *dev = &dsi->dev;
+   int ret;
+
+   dsi->mode_flags |= MIPI_DSI_MODE_LPM;
+
+   ret = dsi_dcs_write_seq(dsi, 0xbb, 0x13);
+   if (ret < 0) {
+   dev_err(dev, "Failed to send command: %d\n", ret);
+   return ret;
+   }
+
+   ret = dsi_dcs_write_seq(dsi, MIPI_DCS_WRITE_MEMORY_START);
+   if (ret < 0) {
+   dev_err(dev, "Failed to send command: %d\n", ret);
+   return ret;
+   }
+
+   ret = mipi_dsi_dcs_exit_sleep_mode(dsi);
+   if (ret < 0) {
+   dev_err(dev, "Failed to exit sleep mode: %d\n", ret);
+   return ret;
+   }
+   msleep(120);
+
+   ret = mipi_dsi_dcs_set_display_on(dsi);
+   if (ret < 0) {
+   dev_err(dev, "Failed to set display on: %d\n", ret);
+   return ret;
+   }
+   msleep(50);
+
+   return 0;
+}
+
+static int sharp_ls060_off(struct sharp_ls060 *ctx)
+{
+   struct mipi_dsi_device *dsi = ctx->dsi;
+   struct device *dev = &dsi->dev;
+   int ret;
+
+   dsi->mode_flags &= ~MIPI_DSI_MODE_LPM;
+
+   ret = mipi_dsi_dcs_set_display_off(ds

[PATCH v2 0/2] Add support for Sharp LS060T1SX01 panel

2021-08-09 Thread Dmitry Baryshkov

Add driver to support Sharp LS06T1SX01 6.0" FullHD panel found e.g. in
the kwaek.ca Dragonboard Display Adapter Bundle.

Changes since v1:
 - Fix the id in the schema file


Dmitry Baryshkov (2):
  dt-bindings: add bindings for the Sharp LS060T1SX01 panel
  drm/panel: Add support for Sharp LS060T1SX01 panel

 .../bindings/display/panel/sharp,ls060t1sx01.yaml  |  51 
 drivers/gpu/drm/panel/Kconfig  |  10 +
 drivers/gpu/drm/panel/Makefile |   1 +
 drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c| 274 +
 4 files changed, 336 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml
 create mode 100644 drivers/gpu/drm/panel/panel-sharp-ls060t1sx01.c

Re: [PATCH] drm/i915: Release ctx->syncobj on final put, not on ctx close

2021-08-09 Thread Daniel Vetter

On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand  wrote:
>
> On August 6, 2021 15:18:59 Daniel Vetter  wrote:
>
>> gem context refcounting is another exercise in least locking design it
>> seems, where most things get destroyed upon context closure (which can
>> race with anything really). Only the actual memory allocation and the
>> locks survive while holding a reference.
>>
>> This tripped up Jason when reimplementing the single timeline feature
>> in
>>
>> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
>> Author: Jason Ekstrand 
>> Date:   Thu Jul 8 10:48:12 2021 -0500
>>
>> drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
>>
>> We could fix the bug by holding ctx->mutex, but it's cleaner to just
>
>
> What bug is this fixing, exactly?

Oh lol I was all busy ranting and not explaining :-) I found it while
auditing other context stuff, so that other patch has the longer
commit message with more history, but that patch is also now tied into
the vm-dercuify, so short summary: You put the syncobj in context
close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final
kref_put. Which means you're open to a use-after-free if you race
against an execbuf. ctx->vm is equally broken (but for other ioctl),
once this fix here is merged I send out the ctx->vm fix because that's
tied into the vm-dercuify now due to conflicts.
-Daniel

>
> --Jason
>
>>
>> make the context object actually invariant over its _entire_ lifetime.
>>
>> Signed-off-by: Daniel Vetter 
>> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj 
>> (v4)")
>> Cc: Jason Ekstrand 
>> Cc: Chris Wilson 
>> Cc: Tvrtko Ursulin 
>> Cc: Joonas Lahtinen 
>> Cc: Matthew Brost 
>> Cc: Matthew Auld 
>> Cc: Maarten Lankhorst 
>> Cc: "Thomas Hellström" 
>> Cc: Lionel Landwerlin 
>> Cc: Dave Airlie 
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index 754b9b8d4981..93ba0197d70a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
>>   trace_i915_context_free(ctx);
>>   GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
>>
>> + if (ctx->syncobj)
>> + drm_syncobj_put(ctx->syncobj);
>> +
>>   mutex_destroy(&ctx->engines_mutex);
>>   mutex_destroy(&ctx->lut_mutex);
>>
>> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
>>   if (vm)
>>   i915_vm_close(vm);
>>
>> - if (ctx->syncobj)
>> - drm_syncobj_put(ctx->syncobj);
>> -
>>   ctx->file_priv = ERR_PTR(-EBADF);
>>
>>   /*
>> --
>> 2.32.0
>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:40:11PM +0200, Daniel Vetter wrote:
> On Mon, Aug 09, 2021 at 04:37:55PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:12PM -0700, Matthew Brost wrote:
> > > Introduce context parent-child relationship. Once this relationship is
> > > created all pinning / unpinning operations are directed to the parent
> > > context. The parent context is responsible for pinning all of its'
> > > children and itself.
> > > 
> > > This is a precursor to the full GuC multi-lrc implementation but aligns
> > > to how GuC mutli-lrc interface is defined - a single H2G is used
> > > register / deregister all of the contexts simultaneously.
> > > 
> > > Subsequent patches in the series will implement the pinning / unpinning
> > > operations for parent / child contexts.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.c   | 29 +++
> > >  drivers/gpu/drm/i915/gt/intel_context.h   | 18 
> > >  drivers/gpu/drm/i915/gt/intel_context_types.h | 12 
> > >  3 files changed, 59 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index 745e84c72c90..8cb92b10b547 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -395,6 +395,8 @@ intel_context_init(struct intel_context *ce, struct 
> > > intel_engine_cs *engine)
> > >   spin_lock_init(&ce->guc_state.lock);
> > >   INIT_LIST_HEAD(&ce->guc_state.fences);
> > >  
> > > + INIT_LIST_HEAD(&ce->guc_child_list);
> > > +
> > >   spin_lock_init(&ce->guc_active.lock);
> > >   INIT_LIST_HEAD(&ce->guc_active.requests);
> > >  
> > > @@ -414,10 +416,17 @@ intel_context_init(struct intel_context *ce, struct 
> > > intel_engine_cs *engine)
> > >  
> > >  void intel_context_fini(struct intel_context *ce)
> > >  {
> > > + struct intel_context *child, *next;
> > > +
> > >   if (ce->timeline)
> > >   intel_timeline_put(ce->timeline);
> > >   i915_vm_put(ce->vm);
> > >  
> > > + /* Need to put the creation ref for the children */
> > > + if (intel_context_is_parent(ce))
> > > + for_each_child_safe(ce, child, next)
> > > + intel_context_put(child);
> > > +
> > >   mutex_destroy(&ce->pin_mutex);
> > >   i915_active_fini(&ce->active);
> > >  }
> > > @@ -533,6 +542,26 @@ struct i915_request 
> > > *intel_context_find_active_request(struct intel_context *ce)
> > >   return active;
> > >  }
> > >  
> > > +void intel_context_bind_parent_child(struct intel_context *parent,
> > > +  struct intel_context *child)
> > > +{
> > > + /*
> > > +  * Callers responsibility to validate that this function is used
> > > +  * correctly but we use GEM_BUG_ON here ensure that they do.
> > > +  */
> > > + GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
> > > + GEM_BUG_ON(intel_context_is_pinned(parent));
> > > + GEM_BUG_ON(intel_context_is_child(parent));
> > > + GEM_BUG_ON(intel_context_is_pinned(child));
> > > + GEM_BUG_ON(intel_context_is_child(child));
> > > + GEM_BUG_ON(intel_context_is_parent(child));
> > > +
> > > + parent->guc_number_children++;
> > > + list_add_tail(&child->guc_child_link,
> > > +   &parent->guc_child_list);
> > > + child->parent = parent;
> > > +}
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > >  #include "selftest_context.c"
> > >  #endif
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index c41098950746..ad6ce5ac4824 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -44,6 +44,24 @@ void intel_context_free(struct intel_context *ce);
> > >  int intel_context_reconfigure_sseu(struct intel_context *ce,
> > >  const struct intel_sseu sseu);
> > >  
> > > +static inline bool intel_context_is_child(struct intel_context *ce)
> > > +{
> > > + return !!ce->parent;
> > > +}
> > > +
> > > +static inline bool intel_context_is_parent(struct intel_context *ce)
> > > +{
> > > + return !!ce->guc_number_children;
> > > +}
> > > +
> > > +void intel_context_bind_parent_child(struct intel_context *parent,
> > > +  struct intel_context *child);
> > > +
> > > +#define for_each_child(parent, ce)\
> > > + list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
> > > +#define for_each_child_safe(parent, ce, cn)\
> > > + list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, 
> > > guc_child_link)
> > > +
> > >  /**
> > >   * intel_context_lock_pinned - Stablises the 'pinned' status of the HW 
> > > context
> > >   * @ce - the context
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > index 2df79ba39867..66b22b370a72 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > >

Re: [Intel-gfx] [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:37:55PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:12PM -0700, Matthew Brost wrote:
> > Introduce context parent-child relationship. Once this relationship is
> > created all pinning / unpinning operations are directed to the parent
> > context. The parent context is responsible for pinning all of its'
> > children and itself.
> > 
> > This is a precursor to the full GuC multi-lrc implementation but aligns
> > to how GuC mutli-lrc interface is defined - a single H2G is used
> > register / deregister all of the contexts simultaneously.
> > 
> > Subsequent patches in the series will implement the pinning / unpinning
> > operations for parent / child contexts.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c   | 29 +++
> >  drivers/gpu/drm/i915/gt/intel_context.h   | 18 
> >  drivers/gpu/drm/i915/gt/intel_context_types.h | 12 
> >  3 files changed, 59 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 745e84c72c90..8cb92b10b547 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -395,6 +395,8 @@ intel_context_init(struct intel_context *ce, struct 
> > intel_engine_cs *engine)
> > spin_lock_init(&ce->guc_state.lock);
> > INIT_LIST_HEAD(&ce->guc_state.fences);
> >  
> > +   INIT_LIST_HEAD(&ce->guc_child_list);
> > +
> > spin_lock_init(&ce->guc_active.lock);
> > INIT_LIST_HEAD(&ce->guc_active.requests);
> >  
> > @@ -414,10 +416,17 @@ intel_context_init(struct intel_context *ce, struct 
> > intel_engine_cs *engine)
> >  
> >  void intel_context_fini(struct intel_context *ce)
> >  {
> > +   struct intel_context *child, *next;
> > +
> > if (ce->timeline)
> > intel_timeline_put(ce->timeline);
> > i915_vm_put(ce->vm);
> >  
> > +   /* Need to put the creation ref for the children */
> > +   if (intel_context_is_parent(ce))
> > +   for_each_child_safe(ce, child, next)
> > +   intel_context_put(child);
> > +
> > mutex_destroy(&ce->pin_mutex);
> > i915_active_fini(&ce->active);
> >  }
> > @@ -533,6 +542,26 @@ struct i915_request 
> > *intel_context_find_active_request(struct intel_context *ce)
> > return active;
> >  }
> >  
> > +void intel_context_bind_parent_child(struct intel_context *parent,
> > +struct intel_context *child)
> > +{
> > +   /*
> > +* Callers responsibility to validate that this function is used
> > +* correctly but we use GEM_BUG_ON here ensure that they do.
> > +*/
> > +   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
> > +   GEM_BUG_ON(intel_context_is_pinned(parent));
> > +   GEM_BUG_ON(intel_context_is_child(parent));
> > +   GEM_BUG_ON(intel_context_is_pinned(child));
> > +   GEM_BUG_ON(intel_context_is_child(child));
> > +   GEM_BUG_ON(intel_context_is_parent(child));
> > +
> > +   parent->guc_number_children++;
> > +   list_add_tail(&child->guc_child_link,
> > + &parent->guc_child_list);
> > +   child->parent = parent;
> > +}
> > +
> >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >  #include "selftest_context.c"
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index c41098950746..ad6ce5ac4824 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -44,6 +44,24 @@ void intel_context_free(struct intel_context *ce);
> >  int intel_context_reconfigure_sseu(struct intel_context *ce,
> >const struct intel_sseu sseu);
> >  
> > +static inline bool intel_context_is_child(struct intel_context *ce)
> > +{
> > +   return !!ce->parent;
> > +}
> > +
> > +static inline bool intel_context_is_parent(struct intel_context *ce)
> > +{
> > +   return !!ce->guc_number_children;
> > +}
> > +
> > +void intel_context_bind_parent_child(struct intel_context *parent,
> > +struct intel_context *child);
> > +
> > +#define for_each_child(parent, ce)\
> > +   list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
> > +#define for_each_child_safe(parent, ce, cn)\
> > +   list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, 
> > guc_child_link)
> > +
> >  /**
> >   * intel_context_lock_pinned - Stablises the 'pinned' status of the HW 
> > context
> >   * @ce - the context
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 2df79ba39867..66b22b370a72 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -202,6 +202,18 @@ struct intel_context {
> > /* GuC context blocked fence */
> > struct i915_sw_fence guc_blocked;
> >  
> > +   /* Head of children list or link

Re: [PATCH] drm/amdgpu: Removed unnecessary if statement

2021-08-09 Thread Alex Deucher

On Mon, Aug 9, 2021 at 9:59 AM Sergio Miguéns Iglesias
 wrote:
>
> There was an "if" statement that did nothing so it was removed.
>
> Signed-off-by: Sergio Miguéns Iglesias 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> index 09b048647523..5eb3869d029e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> @@ -273,9 +273,6 @@ static int amdgpufb_create(struct drm_fb_helper *helper,
> return 0;
>
>  out:
> -   if (abo) {
> -
> -   }
> if (fb && ret) {
> drm_gem_object_put(gobj);
> drm_framebuffer_unregister_private(fb);
> --
> 2.32.0
>

Re: [PATCH 14/46] drm/i915: Expose logical engine instance to user

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:30:06PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:11PM -0700, Matthew Brost wrote:
> > Expose logical engine instance to user via query engine info IOCTL. This
> > is required for split-frame workloads as these needs to be placed on
> > engines in a logically contiguous order. The logical mapping can change
> > based on fusing. Rather than having user have knowledge of the fusing we
> > simply just expose the logical mapping with the existing query engine
> > info IOCTL.
> > 
> > Cc: Tvrtko Ursulin 
> > Signed-off-by: Matthew Brost 
> 
> Uapi must have a link to the userspace MR/patch set using this, and to the
> igt patch set validating it.
> 

Have an IGT:
https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1

Not sure when the media UMD is going to be updated upstream to use this.
Does that mean I can't merge this until the media UMD is ready? Seems
like it but isn't that a circular dependency? How can the media team
develop for a new uAPI that isn't in the kernel yet?

For what it is worth the downstream release is already using this.

Matt

> Ideally in each patch, since it's way too hard to unfortunately find the
> cover letter late on.
> 
> Jason even went as far as making this a hard requirement because he wasted
> a bit too much time trying to find the userspace for new uapi:
> 
> https://lore.kernel.org/dri-devel/20210804185704.624883-1-ja...@jlekstrand.net/
> 
> Cheers, Daniel
> 
> >---
> >  drivers/gpu/drm/i915/i915_query.c | 2 ++
> >  include/uapi/drm/i915_drm.h   | 8 +++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_query.c 
> > b/drivers/gpu/drm/i915/i915_query.c
> > index e49da36c62fb..8a72923fbdba 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
> > for_each_uabi_engine(engine, i915) {
> > info.engine.engine_class = engine->uabi_class;
> > info.engine.engine_instance = engine->uabi_instance;
> > +   info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
> > info.capabilities = engine->uabi_capabilities;
> > +   info.logical_instance = ilog2(engine->logical_mask);
> >  
> > if (copy_to_user(info_ptr, &info, sizeof(info)))
> > return -EFAULT;
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 7f13d241417f..ef72e07fe08c 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -2706,14 +2706,20 @@ struct drm_i915_engine_info {
> >  
> > /** @flags: Engine flags. */
> > __u64 flags;
> > +#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE  (1 << 0)
> >  
> > /** @capabilities: Capabilities of this engine. */
> > __u64 capabilities;
> >  #define I915_VIDEO_CLASS_CAPABILITY_HEVC   (1 << 0)
> >  #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC(1 << 1)
> >  
> > +   /** @logical_instance: Logical instance of engine */
> > +   __u16 logical_instance;
> > +
> > /** @rsvd1: Reserved fields. */
> > -   __u64 rsvd1[4];
> > +   __u16 rsvd1[3];
> > +   /** @rsvd2: Reserved fields. */
> > +   __u64 rsvd2[3];
> >  };
> >  
> >  /**
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan


On 2021-08-10 00:00, Rob Clark wrote:

On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
 wrote:


On 2021-08-09 23:37, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
>  wrote:
>>
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>> >> >
>> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  
wrote:
>> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
wrote:
>> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash 
Ranjan wrote:
>> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
Ranjan wrote:
>> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
IOMMU_SYS_CACHE_ONLY flag")
>> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
along with it went
>> >> > > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
>> >> > > > > > > > > > > system cache. Now that system cache support for GPU 
is added, we will
>> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers 
to be sys cached.
>> >> > > > > > > > > > > Without this, the system cache lines are not 
allocated for GPU.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > So the patches in this series introduces a new prot 
flag IOMMU_LLC,
>> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
>> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
it, as it does
>> >> > > > > > > > > > not apply anymore?
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no 
changes needed, then
>> >> > > > > > > > > I can repost the patch.
>> >> > > > > > > >
>> >> > > > > > > > I still think you need to handle the mismatched alias, no? 
You're adding
>> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the 
CPU side. That
>> >> > > > > > > > can't be right.
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > > Just curious, and maybe this is a dumb question, but what is 
your
>> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy 
on the
>> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different 
and
>> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> > > > > >
>> >> > > > > > If the CPU accesses a shared buffer with different attributes 
to those which
>> >> > > > > > the device is using then you fall into the "mismatched memory 
attributes"
>> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
should go and
>> >> > > > > > read it) and in some cases can apply to speculative accesses as 
well, but
>> >> > > > > > the end result is typically loss of coherency.
>> >> > > > >
>> >> > > > > Ok, I might have a few other sections to read first to decipher 
the
>> >> > > > > terminology..
>> >> > > > >
>> >> > > > > But my understanding of LLC is that it looks just like system 
memory
>> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't 
it be
>> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> > > >
>> >> > > > You could certainly build a system where mismatched attributes 
don't cause
>> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
and the
>> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
then I
>> >> > > > don't think it helps much.
>> >> > > >
>> >> > >
>> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> > > to transparent outer level cache:
>> >> > >
>> >> > > +/*
>> >> > > + * Non-coherent masters can use this page protection flag to set 
cacheable
>> >> > > + * memory attributes for only a transparent outer level of cache, 
also known as
>> >> > > + * the last-level or system cache.
>> >> > > + */
>> >> > > +#define IOMMU_LLC  (1 << 6)
>> >> > >
>> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> > > like that to make it more clear that it is not necessarily something
>> >> > > that would work with a different outer level cache implementation?
>> >> >
>> >> > ... or we could just deal with the problem so that other people can 
reuse
>> >> > the code. I haven't really understood the reluctance to solve this 
properly.
>> >> >
>> >> > Am I missing some reason

Re: [PATCH 13/46] drm/i915: Add logical engine mapping

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:28:04PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:10PM -0700, Matthew Brost wrote:
> > Add logical engine mapping. This is required for split-frame, as
> > workloads need to be placed on engines in a logically contiguous manner.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 60 ---
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
> >  .../drm/i915/gt/intel_execlists_submission.c  |  1 +
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  2 +-
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 21 +--
> >  5 files changed, 56 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 0d9105a31d84..4d790f9a65dd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -290,7 +290,8 @@ static void nop_irq_handler(struct intel_engine_cs 
> > *engine, u16 iir)
> > GEM_DEBUG_WARN_ON(iir);
> >  }
> >  
> > -static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > +static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id,
> > + u8 logical_instance)
> >  {
> > const struct engine_info *info = &intel_engines[id];
> > struct drm_i915_private *i915 = gt->i915;
> > @@ -334,6 +335,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> > intel_engine_id id)
> >  
> > engine->class = info->class;
> > engine->instance = info->instance;
> > +   engine->logical_mask = BIT(logical_instance);
> > __sprint_engine_name(engine);
> >  
> > engine->props.heartbeat_interval_ms =
> > @@ -572,6 +574,37 @@ static intel_engine_mask_t init_engine_mask(struct 
> > intel_gt *gt)
> > return info->engine_mask;
> >  }
> >  
> > +static void populate_logical_ids(struct intel_gt *gt, u8 *logical_ids,
> > +u8 class, const u8 *map, u8 num_instances)
> > +{
> > +   int i, j;
> > +   u8 current_logical_id = 0;
> > +
> > +   for (j = 0; j < num_instances; ++j) {
> > +   for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> > +   if (!HAS_ENGINE(gt, i) ||
> > +   intel_engines[i].class != class)
> > +   continue;
> > +
> > +   if (intel_engines[i].instance == map[j]) {
> > +   logical_ids[intel_engines[i].instance] =
> > +   current_logical_id++;
> > +   break;
> > +   }
> > +   }
> > +   }
> > +}
> > +
> > +static void setup_logical_ids(struct intel_gt *gt, u8 *logical_ids, u8 
> > class)
> > +{
> > +   int i;
> > +   u8 map[MAX_ENGINE_INSTANCE + 1];
> > +
> > +   for (i = 0; i < MAX_ENGINE_INSTANCE + 1; ++i)
> > +   map[i] = i;
> > +   populate_logical_ids(gt, logical_ids, class, map, ARRAY_SIZE(map));
> > +}
> > +
> >  /**
> >   * intel_engines_init_mmio() - allocate and prepare the Engine Command 
> > Streamers
> >   * @gt: pointer to struct intel_gt
> > @@ -583,7 +616,8 @@ int intel_engines_init_mmio(struct intel_gt *gt)
> > struct drm_i915_private *i915 = gt->i915;
> > const unsigned int engine_mask = init_engine_mask(gt);
> > unsigned int mask = 0;
> > -   unsigned int i;
> > +   unsigned int i, class;
> > +   u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
> > int err;
> >  
> > drm_WARN_ON(&i915->drm, engine_mask == 0);
> > @@ -593,15 +627,23 @@ int intel_engines_init_mmio(struct intel_gt *gt)
> > if (i915_inject_probe_failure(i915))
> > return -ENODEV;
> >  
> > -   for (i = 0; i < ARRAY_SIZE(intel_engines); i++) {
> > -   if (!HAS_ENGINE(gt, i))
> > -   continue;
> > +   for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
> > +   setup_logical_ids(gt, logical_ids, class);
> >  
> > -   err = intel_engine_setup(gt, i);
> > -   if (err)
> > -   goto cleanup;
> > +   for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> > +   u8 instance = intel_engines[i].instance;
> > +
> > +   if (intel_engines[i].class != class ||
> > +   !HAS_ENGINE(gt, i))
> > +   continue;
> >  
> > -   mask |= BIT(i);
> > +   err = intel_engine_setup(gt, i,
> > +logical_ids[instance]);
> > +   if (err)
> > +   goto cleanup;
> > +
> > +   mask |= BIT(i);
> > +   }
> > }
> >  
> > /*
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index ed91bcff20eb..85e5c9a9e502 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -

Re: [PATCH] drm/amdgpu: fix kernel-doc warnings on non-kernel-doc comments

2021-08-09 Thread Alex Deucher

On Sat, Aug 7, 2021 at 7:38 PM Randy Dunlap  wrote:
>
> Don't use "begin kernel-doc notation" (/**) for comments that are
> not kernel-doc. This eliminates warnings reported by the 0day bot.
>
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:89: warning: This comment starts with 
> '/**', but isn't a kernel-doc comment. Refer 
> Documentation/doc-guide/kernel-doc.rst
> * This shader is used to clear VGPRS and LDS, and also write the input
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:209: warning: This comment starts 
> with '/**', but isn't a kernel-doc comment. Refer 
> Documentation/doc-guide/kernel-doc.rst
> * The below shaders are used to clear SGPRS, and also write the input
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:301: warning: This comment starts 
> with '/**', but isn't a kernel-doc comment. Refer 
> Documentation/doc-guide/kernel-doc.rst
> * This shader is used to clear the uninitiated sgprs after the above
>
> Fixes: 0e0036c7d13b ("drm/amdgpu: fix no full coverage issue for gprs 
> initialization")
> Signed-off-by: Randy Dunlap 
> Reported-by: kernel test robot 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: "Pan, Xinhui" 
> Cc: Dennis Li 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- linux-next-20210806.orig/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
> +++ linux-next-20210806/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
> @@ -85,7 +85,7 @@ static const struct soc15_reg_golden gol
> SOC15_REG_GOLDEN_VALUE(GC, 0, regTCI_CNTL_3, 0xff, 0x20),
>  };
>
> -/**
> +/*
>   * This shader is used to clear VGPRS and LDS, and also write the input
>   * pattern into the write back buffer, which will be used by driver to
>   * check whether all SIMDs have been covered.
> @@ -206,7 +206,7 @@ const struct soc15_reg_entry vgpr_init_r
> { SOC15_REG_ENTRY(GC, 0, regCOMPUTE_STATIC_THREAD_MGMT_SE7), 
> 0x },
>  };
>
> -/**
> +/*
>   * The below shaders are used to clear SGPRS, and also write the input
>   * pattern into the write back buffer. The first two dispatch should be
>   * scheduled simultaneously which make sure that all SGPRS could be
> @@ -302,7 +302,7 @@ const struct soc15_reg_entry sgpr96_init
> { SOC15_REG_ENTRY(GC, 0, regCOMPUTE_STATIC_THREAD_MGMT_SE7), 
> 0x },
>  };
>
> -/**
> +/*
>   * This shader is used to clear the uninitiated sgprs after the above
>   * two dispatches, because of hardware feature, dispath 0 couldn't clear
>   * top hole sgprs. Therefore need 4 waves per SIMD to cover these sgprs

Re: [PATCH] drm/amd/display: use do-while-0 for DC_TRACE_LEVEL_MESSAGE()

2021-08-09 Thread Alex Deucher

On Sun, Aug 8, 2021 at 10:52 PM Randy Dunlap  wrote:
>
> Building with W=1 complains about an empty 'else' statement, so use the
> usual do-nothing-while-0 loop to quieten this warning.
>
> ../drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:113:53: warning: 
> suggest braces around empty body in an 'else' statement [-Wempty-body]
>   113 | *state, retry_count);
>
> Fixes: b30eda8d416c ("drm/amd/display: Add ETW log to dmub_psr_get_state")
> Signed-off-by: Randy Dunlap 
> Cc: Wyatt Wood 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: "Pan, Xinhui" 
> Cc: Harry Wentland 
> Cc: Leo Li 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: David Airlie 
> Cc: Daniel Vetter 

Applied.  Thanks!

Alex

> ---
>  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linux-next-20210806.orig/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> +++ linux-next-20210806/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> @@ -29,7 +29,7 @@
>  #include "dmub/dmub_srv.h"
>  #include "core_types.h"
>
> -#define DC_TRACE_LEVEL_MESSAGE(...) /* do nothing */
> +#define DC_TRACE_LEVEL_MESSAGE(...)do {} while (0) /* do nothing */
>
>  #define MAX_PIPES 6
>

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
 wrote:
>
> On 2021-08-09 23:37, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> >  wrote:
> >>
> >> On 2021-08-09 23:10, Will Deacon wrote:
> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >> >> >
> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  
> >> >> > > > > wrote:
> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon 
> >> >> > > > > > >  wrote:
> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash 
> >> >> > > > > > > > Ranjan wrote:
> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> >> >> > > > > > > > > > Ranjan wrote:
> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> >> >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
> >> >> > > > > > > > > > > along with it went
> >> >> > > > > > > > > > > the memory type setting required for the 
> >> >> > > > > > > > > > > non-coherent masters to use
> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU 
> >> >> > > > > > > > > > > is added, we will
> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers 
> >> >> > > > > > > > > > > to be sys cached.
> >> >> > > > > > > > > > > Without this, the system cache lines are not 
> >> >> > > > > > > > > > > allocated for GPU.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > So the patches in this series introduces a new prot 
> >> >> > > > > > > > > > > flag IOMMU_LLC,
> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> >> >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to 
> >> >> > > > > > > > > > refresh it, as it does
> >> >> > > > > > > > > > not apply anymore?
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no 
> >> >> > > > > > > > > changes needed, then
> >> >> > > > > > > > > I can repost the patch.
> >> >> > > > > > > >
> >> >> > > > > > > > I still think you need to handle the mismatched alias, 
> >> >> > > > > > > > no? You're adding
> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the 
> >> >> > > > > > > > CPU side. That
> >> >> > > > > > > > can't be right.
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what 
> >> >> > > > > > > is your
> >> >> > > > > > > concern about mismatched aliases?  I mean the cache 
> >> >> > > > > > > hierarchy on the
> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty 
> >> >> > > > > > > different and
> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> >> > > > > >
> >> >> > > > > > If the CPU accesses a shared buffer with different attributes 
> >> >> > > > > > to those which
> >> >> > > > > > the device is using then you fall into the "mismatched memory 
> >> >> > > > > > attributes"
> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving 
> >> >> > > > > > (you should go and
> >> >> > > > > > read it) and in some cases can apply to speculative accesses 
> >> >> > > > > > as well, but
> >> >> > > > > > the end result is typically loss of coherency.
> >> >> > > > >
> >> >> > > > > Ok, I might have a few other sections to read first to decipher 
> >> >> > > > > the
> >> >> > > > > terminology..
> >> >> > > > >
> >> >> > > > > But my understanding of LLC is that it looks just like system 
> >> >> > > > > memory
> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, 
> >> >> > > > > shouldn't it be
> >> >> > > > > invisible from the point of view of different CPU mapping 
> >> >> > > > > options?
> >> >> > > >
> >> >> > > > You could certainly build a system where mismatched attributes 
> >> >> > > > don't cause
> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
> >> >> > > > and the
> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
> >> >> > > > then I
> >> >> > > > don't think it helps much.
> >> >> > > >
> >> >> > >
> >> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> >> > > to transparent outer level cache:
> >> >> > >
> >> >> > > +/*
> >>

Re: [Intel-gfx] [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:27:01PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> > Calling switch_to_kernel_context isn't needed if the engine PM reference
> > is taken while all contexts are pinned. By not calling
> > switch_to_kernel_context we save on issuing a request to the engine.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
> > b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 1f07ac4e0672..58099de6bf07 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct 
> > intel_engine_cs *engine)
> > unsigned long flags;
> > bool result = true;
> >  
> > +   /* No need to switch_to_kernel_context if GuC submission */
> 
> Maybe whack a big FIXME on here that we should unravel this properly.

Sure, can add a FIXME here.

> Currently the execlist backend assumptions are leaked all over the place,
> leading to stuff like this. Which means extremely fragile code.
>

Yes, this something required for execlists implemented in what should be
generic code. 

> I currently don't have a great idea on how exactly we should do that, but
> oh well.

Me either, it will be a process.

> 
> btw just in case we ever want to make guc lrc properly evictable (which as
> the og use-case for this function, way, way back), would we need to fully

Can you explain what you mean by fully evictable? Not getting what you
mean in this context.

> unregister them from guc? At least I'm assuming there's no other trick

If scheduling is disabled on the context (currently done on unpin) you are
free move anything around as the GuC is guaranteed not to touch the
context state. If on re-pin something has moved (e.g. the LRC vaddr is
different), you need to unregister and re-register the context with the
GuC.

> like the below one.
> 
> Another aside: How does the perf/OA patching work on GuC?
>

Not my area of expertise but perf somewhat a WIP. The plan is for the
GuC to write out some stats to HWSP I think? John Harrison is working to
get this fully implemented.

OA is working afaik, with Umesh Nerlige Ramappa being the expert here.

Matt

> Anyway, patch looks legit:
> 
> Reviewed-by: Daniel Vetter 
> 
> 
> > +   if (intel_engine_uses_guc(engine))
> > +   return true;
> > +
> > /* GPU is pointing to the void, as good as in the kernel context. */
> > if (intel_gt_is_wedged(engine->gt))
> > return true;
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 04:23:42PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:07PM -0700, Matthew Brost wrote:
> > Taking a PM reference to prevent intel_gt_wait_for_idle from short
> > circuiting while a scheduling of user context could be enabled.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/Makefile |  1 +
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
> >  2 files changed, 34 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 903de270f2db..5e3a1e2095b0 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -103,6 +103,7 @@ gt-y += \
> > gt/intel_gt_clock_utils.o \
> > gt/intel_gt_irq.o \
> > gt/intel_gt_pm.o \
> > +   gt/intel_gt_pm_unpark_work.o \
> 
> This file isn't here?
> 

Yep, included this in the wrong patch. Should be in:
https://patchwork.freedesktop.org/patch/448462/?series=92789&rev=2

> Also pm stuff tends to have very nasty locking requirements, doing special
> stuff like this in the backend tends to lead to really big surprises. I
> think two options to make sure our locking design stays consistent:
> - Lift this to generic code.

Not sure I'm following this, intel_engine_pm_get/put are generic calls.
Those calls should have all the correct annoations. If they don't we can
add them.

Matt

> - expose some engine_pm_migt_get/put() calls which do have the right set
>   of might_lock annoations, and call those in the generic code.
> 
> Imo the worst kernel abstractions are those where all implementations
> look&act the same, except for locking. Unfortunately i915-gem code is full
> of this stuff, and we need to stop this by enlisting lockdep to check the
> contracts for us.
> -Daniel
> 
> > gt/intel_gt_pm_irq.o \
> > gt/intel_gt_requests.o \
> > gt/intel_gtt.o \
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 7fe4d1559a81..c5d9548bfd00 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2056,7 +2056,12 @@ static int guc_context_pre_pin(struct intel_context 
> > *ce,
> >  
> >  static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >  {
> > -   return __guc_context_pin(ce, ce->engine, vaddr);
> > +   int ret = __guc_context_pin(ce, ce->engine, vaddr);
> > +
> > +   if (likely(!ret && !intel_context_is_barrier(ce)))
> > +   intel_engine_pm_get(ce->engine);
> > +
> > +   return ret;
> >  }
> >  
> >  static void guc_context_unpin(struct intel_context *ce)
> > @@ -2067,6 +2072,9 @@ static void guc_context_unpin(struct intel_context 
> > *ce)
> >  
> > unpin_guc_id(guc, ce, true);
> > lrc_unpin(ce);
> > +
> > +   if (likely(!intel_context_is_barrier(ce)))
> > +   intel_engine_pm_put(ce->engine);
> >  }
> >  
> >  static void guc_context_post_unpin(struct intel_context *ce)
> > @@ -3002,8 +3010,30 @@ static int guc_virtual_context_pre_pin(struct 
> > intel_context *ce,
> >  static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> >  {
> > struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +   int ret = __guc_context_pin(ce, engine, vaddr);
> > +   intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +
> > +   if (likely(!ret))
> > +   for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +   intel_engine_pm_get(engine);
> >  
> > -   return __guc_context_pin(ce, engine, vaddr);
> > +   return ret;
> > +}
> > +
> > +static void guc_virtual_context_unpin(struct intel_context *ce)
> > +{
> > +   intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +   struct intel_engine_cs *engine;
> > +   struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +   GEM_BUG_ON(context_enabled(ce));
> > +   GEM_BUG_ON(intel_context_is_barrier(ce));
> > +
> > +   unpin_guc_id(guc, ce, true);
> > +   lrc_unpin(ce);
> > +
> > +   for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +   intel_engine_pm_put(engine);
> >  }
> >  
> >  static void guc_virtual_context_enter(struct intel_context *ce)
> > @@ -3040,7 +3070,7 @@ static const struct intel_context_ops 
> > virtual_guc_context_ops = {
> >  
> > .pre_pin = guc_virtual_context_pre_pin,
> > .pin = guc_virtual_context_pin,
> > -   .unpin = guc_context_unpin,
> > +   .unpin = guc_virtual_context_unpin,
> > .post_unpin = guc_context_post_unpin,
> >  
> > .ban = guc_context_ban,
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan


On 2021-08-09 23:37, Rob Clark wrote:

On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
 wrote:


On 2021-08-09 23:10, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>> >
>> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
>> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
wrote:
>> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
wrote:
>> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
Ranjan wrote:
>> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
IOMMU_SYS_CACHE_ONLY flag")
>> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
with it went
>> > > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
>> > > > > > > > > > > system cache. Now that system cache support for GPU is 
added, we will
>> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to 
be sys cached.
>> > > > > > > > > > > Without this, the system cache lines are not allocated 
for GPU.
>> > > > > > > > > > >
>> > > > > > > > > > > So the patches in this series introduces a new prot flag 
IOMMU_LLC,
>> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
>> > > > > > > > > > > and makes GPU the user of this protection flag.
>> > > > > > > > > >
>> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
it, as it does
>> > > > > > > > > > not apply anymore?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
needed, then
>> > > > > > > > > I can repost the patch.
>> > > > > > > >
>> > > > > > > > I still think you need to handle the mismatched alias, no? 
You're adding
>> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
side. That
>> > > > > > > > can't be right.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on 
the
>> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> > > > > > > doesn't really care about the smmu pgtable attributes..
>> > > > > >
>> > > > > > If the CPU accesses a shared buffer with different attributes to 
those which
>> > > > > > the device is using then you fall into the "mismatched memory 
attributes"
>> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
should go and
>> > > > > > read it) and in some cases can apply to speculative accesses as 
well, but
>> > > > > > the end result is typically loss of coherency.
>> > > > >
>> > > > > Ok, I might have a few other sections to read first to decipher the
>> > > > > terminology..
>> > > > >
>> > > > > But my understanding of LLC is that it looks just like system memory
>> > > > > to the CPU and GPU (I think that would make it "the point of
>> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it 
be
>> > > > > invisible from the point of view of different CPU mapping options?
>> > > >
>> > > > You could certainly build a system where mismatched attributes don't 
cause
>> > > > loss of coherence, but as it's not guaranteed by the architecture and 
the
>> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> > > > don't think it helps much.
>> > > >
>> > >
>> > > Hmm, the description of the new mapping flag is that it applies only
>> > > to transparent outer level cache:
>> > >
>> > > +/*
>> > > + * Non-coherent masters can use this page protection flag to set 
cacheable
>> > > + * memory attributes for only a transparent outer level of cache, also 
known as
>> > > + * the last-level or system cache.
>> > > + */
>> > > +#define IOMMU_LLC  (1 << 6)
>> > >
>> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > like that to make it more clear that it is not necessarily something
>> > > that would work with a different outer level cache implementation?
>> >
>> > ... or we could just deal with the problem so that other people can reuse
>> > the code. I haven't really understood the reluctance to solve this 
properly.
>> >
>> > Am I missing some reason this isn't solvable?
>>
>> Oh, was there another way to solve it (other than foregoing setting
>> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> corresponding setting on the MMU pgtables side of things?
>
> Right -- we just need to program the CPU's MMU with the matching memory
> attributes! It's a bit more fiddly if you're j

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
 wrote:
>
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >> >
> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
> >> > > > > > > wrote:
> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> >> > > > > > > > wrote:
> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> >> > > > > > > > > > Ranjan wrote:
> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
> >> > > > > > > > > > > along with it went
> >> > > > > > > > > > > the memory type setting required for the non-coherent 
> >> > > > > > > > > > > masters to use
> >> > > > > > > > > > > system cache. Now that system cache support for GPU is 
> >> > > > > > > > > > > added, we will
> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to 
> >> > > > > > > > > > > be sys cached.
> >> > > > > > > > > > > Without this, the system cache lines are not allocated 
> >> > > > > > > > > > > for GPU.
> >> > > > > > > > > > >
> >> > > > > > > > > > > So the patches in this series introduces a new prot 
> >> > > > > > > > > > > flag IOMMU_LLC,
> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> > > > > > > > > >
> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
> >> > > > > > > > > > it, as it does
> >> > > > > > > > > > not apply anymore?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> >> > > > > > > > > needed, then
> >> > > > > > > > > I can repost the patch.
> >> > > > > > > >
> >> > > > > > > > I still think you need to handle the mismatched alias, no? 
> >> > > > > > > > You're adding
> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> >> > > > > > > > side. That
> >> > > > > > > > can't be right.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > Just curious, and maybe this is a dumb question, but what is 
> >> > > > > > > your
> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy 
> >> > > > > > > on the
> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different 
> >> > > > > > > and
> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> > > > > >
> >> > > > > > If the CPU accesses a shared buffer with different attributes to 
> >> > > > > > those which
> >> > > > > > the device is using then you fall into the "mismatched memory 
> >> > > > > > attributes"
> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
> >> > > > > > should go and
> >> > > > > > read it) and in some cases can apply to speculative accesses as 
> >> > > > > > well, but
> >> > > > > > the end result is typically loss of coherency.
> >> > > > >
> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> > > > > terminology..
> >> > > > >
> >> > > > > But my understanding of LLC is that it looks just like system 
> >> > > > > memory
> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't 
> >> > > > > it be
> >> > > > > invisible from the point of view of different CPU mapping options?
> >> > > >
> >> > > > You could certainly build a system where mismatched attributes don't 
> >> > > > cause
> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
> >> > > > and the
> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
> >> > > > then I
> >> > > > don't think it helps much.
> >> > > >
> >> > >
> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> > > to transparent outer level cache:
> >> > >
> >> > > +/*
> >> > > + * Non-coherent masters can use this page protection flag to set 
> >> > > cacheable
> >> > > + * memory attributes for only a transparent outer level of cache, 
> >> > > also known as
> >> > > + * the last-level or system cache.
> >> > > + */
> >> > > +#define IOMMU_LLC  (1 << 6)
> >> > >
> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> > > like that to make it more clear that it is not necess

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 10:28 AM Akhil P Oommen  wrote:
>
> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
> >
> >
> > On 09/08/2021 17:12, Rob Clark wrote:
> >> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen 
> >> wrote:
> >>>
> >>> On 8/8/2021 10:22 PM, Rob Clark wrote:
>  On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly
>   wrote:
> >
> >
> >
> > On 07/08/2021 21:04, Rob Clark wrote:
> >> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
> >>  wrote:
> >>>
> >>> Hi Rob, Akhil,
> >>>
> >>> On 29/07/2021 21:53, Rob Clark wrote:
>  On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
>   wrote:
> >
> >
> >
> > On 29/07/2021 21:24, Rob Clark wrote:
> >> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> >>  wrote:
> >>>
> >>> Hi Rob,
> >>>
> >>> I've done some more testing! It looks like before that patch
> >>> ("drm/msm: Devfreq tuning") the GPU would never get above
> >>> the second frequency in the OPP table (342MHz) (at least, not
> >>> in glxgears). With the patch applied it would more
> >>> aggressively jump up to the max frequency which seems to be
> >>> unstable at the default regulator voltages.
> >>
> >> *ohh*, yeah, ok, that would explain it
> >>
> >>> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up
> >>> to 0.988v (instead of the stock 0.516v) makes the GPU stable
> >>> at the higher frequencies.
> >>>
> >>> Applying this patch reverts the behaviour, and the GPU never
> >>> goes above 342MHz in glxgears, losing ~30% performance in
> >>> glxgear.
> >>>
> >>> I think (?) that enabling CPR support would be the proper
> >>> solution to this - that would ensure that the regulators run
> >>> at the voltage the hardware needs to be stable.
> >>>
> >>> Is hacking the voltage higher (although ideally not quite
> >>> that high) an acceptable short term solution until we have
> >>> CPR? Or would it be safer to just not make use of the higher
> >>> frequencies on a630 for now?
> >>>
> >>
> >> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is
> >> already
> >> on CC and I added sboyd, maybe one of them knows better.
> >>
> >> In the short term, removing the higher problematic OPPs from
> >> dts might
> >> be a better option than this patch (which I'm dropping), since
> >> there
> >> is nothing stopping other workloads from hitting higher OPPs.
> > Oh yeah that sounds like a more sensible workaround than mine .
> >>
> >> I'm slightly curious why I didn't have problems at higher OPPs
> >> on my
> >> c630 laptop (sdm850)
> > Perhaps you won the sillicon lottery - iirc sdm850 is binned
> > for higher clocks as is out of the factory.
> >
> > Would it be best to drop the OPPs for all devices? Or just
> > those affected? I guess it's possible another c630 might
> > crash where yours doesn't?
> 
>  I've not heard any reports of similar issues from the handful of
>  other
>  folks with c630's on #aarch64-laptops.. but I can't really say
>  if that
>  is luck or not.
> >>> It looks like this affects at least the OnePlus 6 and PocoPhone
> >>> F1, I've done some more poking and the following diff
> >>> seems to fix the stability issues completely, it seems the delay
> >>> is required to let the update propagate.
> >>>
> >>> This doesn't feel like the right fix, but hopefully it's enough
> >>> to come up with a better solution than disabling the new
> >>> devfreq behaviour on a630.
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>> index d7cec7f0dde0..69e2a5e84dae 100644
> >>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>> @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu,
> >>> struct dev_pm_opp *opp)
> >>> return;
> >>> }
> >>>
> >>> +   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> >>> +
> >>> +   usleep_range(300, 500);
> >>> +
> >>
> >>>
> >>> I am a bit confused. We don't define a power domain for gpu in dt,
> >>> correct? Then what exactly set_opp do here? Do you think this usleep is
> >>> what is helping here somehow to mask the issue?
> > The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
> > the GPU DT. For the sake of simplicity I'll refer to the lowest
> > frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
> > the "min" state, and the highest frequency (7

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan


On 2021-08-09 23:10, Will Deacon wrote:

On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:

On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
with it went
> > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is 
added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for 
GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag 
IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're 
adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to 
those which
> > > > > the device is using then you fall into the "mismatched memory 
attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should 
go and
> > > > > read it) and in some cases can apply to speculative accesses as well, 
but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also 
known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC  (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?


Right -- we just need to program the CPU's MMU with the matching memory
attributes! It's a bit more fiddly if you're just using ioremap_wc()
though, as it's usually the DMA API which handles the attributes under 
the

hood.

Anyway, sorry, I should've said that explicitly earlier on. We've done 
this
sort of thing in the Android tree so I assumed Sai knew what needed to 
be

done and then I didn't think to explain to you :(



Right I was aware of that but even in th

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon

On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >
> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
> > > > > > > wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> > > > > > > > wrote:
> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> > > > > > > > > > Ranjan wrote:
> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
> > > > > > > > > > > with it went
> > > > > > > > > > > the memory type setting required for the non-coherent 
> > > > > > > > > > > masters to use
> > > > > > > > > > > system cache. Now that system cache support for GPU is 
> > > > > > > > > > > added, we will
> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
> > > > > > > > > > > sys cached.
> > > > > > > > > > > Without this, the system cache lines are not allocated 
> > > > > > > > > > > for GPU.
> > > > > > > > > > >
> > > > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > > > IOMMU_LLC,
> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > > >
> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
> > > > > > > > > > as it does
> > > > > > > > > > not apply anymore?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > > > needed, then
> > > > > > > > > I can repost the patch.
> > > > > > > >
> > > > > > > > I still think you need to handle the mismatched alias, no? 
> > > > > > > > You're adding
> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> > > > > > > > side. That
> > > > > > > > can't be right.
> > > > > > > >
> > > > > > >
> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on 
> > > > > > > the
> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > > doesn't really care about the smmu pgtable attributes..
> > > > > >
> > > > > > If the CPU accesses a shared buffer with different attributes to 
> > > > > > those which
> > > > > > the device is using then you fall into the "mismatched memory 
> > > > > > attributes"
> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
> > > > > > should go and
> > > > > > read it) and in some cases can apply to speculative accesses as 
> > > > > > well, but
> > > > > > the end result is typically loss of coherency.
> > > > >
> > > > > Ok, I might have a few other sections to read first to decipher the
> > > > > terminology..
> > > > >
> > > > > But my understanding of LLC is that it looks just like system memory
> > > > > to the CPU and GPU (I think that would make it "the point of
> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > > invisible from the point of view of different CPU mapping options?
> > > >
> > > > You could certainly build a system where mismatched attributes don't 
> > > > cause
> > > > loss of coherence, but as it's not guaranteed by the architecture and 
> > > > the
> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > > don't think it helps much.
> > > >
> > >
> > > Hmm, the description of the new mapping flag is that it applies only
> > > to transparent outer level cache:
> > >
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set 
> > > cacheable
> > > + * memory attributes for only a transparent outer level of cache, also 
> > > known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC  (1 << 6)
> > >
> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > like that to make it more clear that it is not necessarily something
> > > that would work with a different outer level cache implementation?
> >
> > ... or we could just deal with the problem so that other people can reuse
> > the code. I haven't really understood the reluctance to solve this properly.
> >
> > Am I missing some reason this isn't solvable?
> 
> Oh, was there another way to solve it (other than foregoing setting
> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> corresponding setting

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Akhil P Oommen


On 8/9/2021 9:48 PM, Caleb Connolly wrote:



On 09/08/2021 17:12, Rob Clark wrote:
On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen  
wrote:


On 8/8/2021 10:22 PM, Rob Clark wrote:
On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly 
 wrote:




On 07/08/2021 21:04, Rob Clark wrote:

On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
 wrote:


Hi Rob, Akhil,

On 29/07/2021 21:53, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
 wrote:




On 29/07/2021 21:24, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
 wrote:


Hi Rob,

I've done some more testing! It looks like before that patch 
("drm/msm: Devfreq tuning") the GPU would never get above
the second frequency in the OPP table (342MHz) (at least, not 
in glxgears). With the patch applied it would more
aggressively jump up to the max frequency which seems to be 
unstable at the default regulator voltages.


*ohh*, yeah, ok, that would explain it

Hacking the pm8005 s1 regulator (which provides VDD_GFX) up 
to 0.988v (instead of the stock 0.516v) makes the GPU stable

at the higher frequencies.

Applying this patch reverts the behaviour, and the GPU never 
goes above 342MHz in glxgears, losing ~30% performance in

glxgear.

I think (?) that enabling CPR support would be the proper 
solution to this - that would ensure that the regulators run

at the voltage the hardware needs to be stable.

Is hacking the voltage higher (although ideally not quite 
that high) an acceptable short term solution until we have
CPR? Or would it be safer to just not make use of the higher 
frequencies on a630 for now?




tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is 
already

on CC and I added sboyd, maybe one of them knows better.

In the short term, removing the higher problematic OPPs from 
dts might
be a better option than this patch (which I'm dropping), since 
there

is nothing stopping other workloads from hitting higher OPPs.

Oh yeah that sounds like a more sensible workaround than mine .


I'm slightly curious why I didn't have problems at higher OPPs 
on my

c630 laptop (sdm850)
Perhaps you won the sillicon lottery - iirc sdm850 is binned 
for higher clocks as is out of the factory.


Would it be best to drop the OPPs for all devices? Or just 
those affected? I guess it's possible another c630 might

crash where yours doesn't?


I've not heard any reports of similar issues from the handful of 
other
folks with c630's on #aarch64-laptops.. but I can't really say 
if that

is luck or not.
It looks like this affects at least the OnePlus 6 and PocoPhone 
F1, I've done some more poking and the following diff
seems to fix the stability issues completely, it seems the delay 
is required to let the update propagate.


This doesn't feel like the right fix, but hopefully it's enough 
to come up with a better solution than disabling the new

devfreq behaviour on a630.

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c

index d7cec7f0dde0..69e2a5e84dae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, 
struct dev_pm_opp *opp)

    return;
    }

+   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+   usleep_range(300, 500);
+




I am a bit confused. We don't define a power domain for gpu in dt,
correct? Then what exactly set_opp do here? Do you think this usleep is
what is helping here somehow to mask the issue?
The power domains (for cx and gx) are defined in the GMU DT, the OPPs in 
the GPU DT. For the sake of simplicity I'll refer to the lowest 
frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as 
the "min" state, and the highest frequency (71000) and OPP level 
(RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in 
sdm845.dtsi under the gpu node.


The new devfreq behaviour unmasks what I think is a driver bug, it 
inadvertently puts much more strain on the GPU regulators than they 
usually get. With the new behaviour the GPU jumps from it's min state to 
the max state and back again extremely rapidly under workloads as small 
as refreshing UI. Where previously the GPU would rarely if ever go above 
342MHz when interacting with the device, it now jumps between min and 
max many times per second.


If my understanding is correct, the current implementation of the GMU 
set freq is the following:

  - Get OPP for frequency to set
  - Push the frequency to the GMU - immediately updating the core clock
  - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds 
up somewhere in power management code and causes the gx regulator level 
to be updated


Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. 
We were using a different api earlier which got deprecated - 
dev_pm_opp_set_bw().




The regulator will then take some time to reach it's new voltage level 
and stabilise. I believe that ra

Re: [Intel-gfx] [PATCH 46/46] drm/i915/guc: Add delay before disabling scheduling on contexts

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:43PM -0700, Matthew Brost wrote:
> Some workloads use lots of contexts that continually pin / unpin
> contexts. With GuC submission an unpin translates to a schedule disable
> H2G which puts pressure on both the i915 and GuC. A schedule disable can
> also block future requests from being submitted until the operation
> completes. None of this is ideal.
> 
> Add a configurable, via debugfs, delay period before the schedule
> disable is issued. Default delay period is 1 second. The delay period is
> skipped if more than 3/4 of the guc_ids are in use.
> 
> This patch also updates the selftests to turn off this delay period as
> this extra time would likely cause many selftests to fail. Follow up
> patches will fix all the selftests and enable the delay period.
> 
> Signed-off-by: Matthew Brost 

I think this is more evidence that we should just pin/unpin context at
create/destruction time. The current scheme doesn't really work that well
and causes way more pain than benefits it seems.

If anyone screams, and that's a big if aside of some igts, we can come up
with a proper scheme to evict contexts without pin/unpin and layer hacks
over that misdesign.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
>  .../i915/gem/selftests/i915_gem_coherency.c   |   2 +-
>  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |   2 +-
>  .../drm/i915/gem/selftests/i915_gem_mman.c|   2 +-
>  .../drm/i915/gem/selftests/i915_gem_object.c  |   2 +-
>  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
>  drivers/gpu/drm/i915/gt/intel_context.h   |   9 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   7 +
>  .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c|  28 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 322 +-
>  .../i915/gt/uc/selftest_guc_flow_control.c|  19 +-
>  drivers/gpu/drm/i915/i915_selftest.h  |   2 +
>  drivers/gpu/drm/i915/i915_trace.h |  10 +
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   2 +-
>  drivers/gpu/drm/i915/selftests/i915_perf.c|   2 +-
>  drivers/gpu/drm/i915/selftests/i915_request.c |   2 +-
>  drivers/gpu/drm/i915/selftests/i915_vma.c |   2 +-
>  18 files changed, 405 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index b199d59bd2c4..1553287e5491 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1298,7 +1298,7 @@ static void engines_idle_release(struct 
> i915_gem_context *ctx,
>   int err;
>  
>   /* serialises with execbuf */
> - set_bit(CONTEXT_CLOSED_BIT, &ce->flags);
> + intel_context_close(ce);
>   if (!intel_context_pin_if_active(ce))
>   continue;
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> index 13b088cc787e..a666d7e610f5 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
> @@ -434,5 +434,5 @@ int i915_gem_coherency_live_selftests(struct 
> drm_i915_private *i915)
>   SUBTEST(igt_gem_coherency),
>   };
>  
> - return i915_subtests(tests, i915);
> + return i915_live_subtests(tests, i915);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> index ffae7df5e4d7..2c92afa9d608 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
> @@ -474,5 +474,5 @@ int i915_gem_dmabuf_live_selftests(struct 
> drm_i915_private *i915)
>   SUBTEST(igt_dmabuf_import_same_driver_lmem_smem),
>   };
>  
> - return i915_subtests(tests, i915);
> + return i915_live_subtests(tests, i915);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index b20f5621f62b..4745c78a48de 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -1414,5 +1414,5 @@ int i915_gem_mman_live_selftests(struct 
> drm_i915_private *i915)
>   SUBTEST(igt_mmap_gpu),
>   };
>  
> - return i915_subtests(tests, i915);
> + return i915_live_subtests(tests, i915);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
> index 740ee8086a27..ae1361c7c4cf 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c
> @@ -95,5 +95,5 @@ int i915_gem_object_live_selftests(struct drm_i915_private 
> *i915)
>   SUBTEST(igt_gem_huge),
>   };
>

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > > > wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
> > > > > > > > > > with it went
> > > > > > > > > > the memory type setting required for the non-coherent 
> > > > > > > > > > masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is 
> > > > > > > > > > added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
> > > > > > > > > > sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for 
> > > > > > > > > > GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > > IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
> > > > > > > > > as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > > needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > > > adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> > > > > > > side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to 
> > > > > those which
> > > > > the device is using then you fall into the "mismatched memory 
> > > > > attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should 
> > > > > go and
> > > > > read it) and in some cases can apply to speculative accesses as well, 
> > > > > but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also 
> > known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC  (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?
>

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?

BR,
-R

Re: [Intel-gfx] [PATCH 41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB submission

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 07:07:44PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:38PM -0700, Matthew Brost wrote:
> > Certain VMA functions in the execbuf IOCTL only need to be called on
> > first or last BB of a multi-BB submission. eb_relocate() on the first
> 
> eb_relocate should be outright disallowed on multi lrc execbuf ioctl.
> There's no users of that left, and it does substantially simplify the
> entire locking problem if we outright disallow that.
> 
> > and eb_release_vmas() on the last. Doing so will save CPU / GPU cycles.
> 
> Yah for our mix of pin/unpin vs dma_resv_lock/unlock. Now with the current
> unpin design this move is ok, but we want/need to switch vma over to
> dma_resv_lock. And then it gets really nasty, because you run into a ton
> of problems.

To give a bit more context of how much this is all nasty: When you publish
a fence, which thanks to rcu lookup of dma_resv happens when you install a
fence, not when you unlock the dma_resv_lock, you're not allowed to
allocate _any_ memory anymore until you're request has finished executing.
This means no allocating anything, including kmalloc for your i915_request
struct for the remaining batches, or the composite fence or anything else
you might do.

userptr also makes this requirement even more fun with additional
serialization requirements against mmu notifier invalidations.

The current execbuf code is a mess in this regard, and the idea is to fix
this with the conversion to drm/sched, because that has a very clear point
of no return. With the design you're pushing you're essentially making
this problem unfixable.
-Daniel

> 
> To more I read this the less I like this :-/
> -Daniel
> 
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 127 +++---
> >  .../i915/gem/selftests/i915_gem_execbuffer.c  |  14 +-
> >  2 files changed, 83 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index ecdb583cc2eb..70784779872a 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -270,7 +270,7 @@ struct i915_execbuffer {
> > /** list of vma that have execobj.relocation_count */
> > struct list_head relocs;
> >  
> > -   struct i915_gem_ww_ctx ww;
> > +   struct i915_gem_ww_ctx *ww;
> >  
> > /**
> >  * Track the most recently used object for relocations, as we
> > @@ -448,7 +448,7 @@ eb_pin_vma(struct i915_execbuffer *eb,
> > pin_flags |= PIN_GLOBAL;
> >  
> > /* Attempt to reuse the current location if available */
> > -   err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
> > +   err = i915_vma_pin_ww(vma, eb->ww, 0, 0, pin_flags);
> > if (err == -EDEADLK)
> > return err;
> >  
> > @@ -457,11 +457,11 @@ eb_pin_vma(struct i915_execbuffer *eb,
> > return err;
> >  
> > /* Failing that pick any _free_ space if suitable */
> > -   err = i915_vma_pin_ww(vma, &eb->ww,
> > -entry->pad_to_size,
> > -entry->alignment,
> > -eb_pin_flags(entry, ev->flags) |
> > -PIN_USER | PIN_NOEVICT);
> > +   err = i915_vma_pin_ww(vma, eb->ww,
> > + entry->pad_to_size,
> > + entry->alignment,
> > + eb_pin_flags(entry, ev->flags) |
> > + PIN_USER | PIN_NOEVICT);
> > if (unlikely(err))
> > return err;
> > }
> > @@ -643,9 +643,9 @@ static int eb_reserve_vma(struct i915_execbuffer *eb,
> > return err;
> > }
> >  
> > -   err = i915_vma_pin_ww(vma, &eb->ww,
> > -  entry->pad_to_size, entry->alignment,
> > -  eb_pin_flags(entry, ev->flags) | pin_flags);
> > +   err = i915_vma_pin_ww(vma, eb->ww,
> > + entry->pad_to_size, entry->alignment,
> > + eb_pin_flags(entry, ev->flags) | pin_flags);
> > if (err)
> > return err;
> >  
> > @@ -940,7 +940,7 @@ static int eb_lock_vmas(struct i915_execbuffer *eb)
> > struct eb_vma *ev = &eb->vma[i];
> > struct i915_vma *vma = ev->vma;
> >  
> > -   err = i915_gem_object_lock(vma->obj, &eb->ww);
> > +   err = i915_gem_object_lock(vma->obj, eb->ww);
> > if (err)
> > return err;
> > }
> > @@ -1020,12 +1020,13 @@ eb_get_vma(const struct i915_execbuffer *eb, 
> > unsigned long handle)
> > }
> >  }
> >  
> > -static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
> > +static void eb_release_vmas(struct i915_execbuffer *eb, bool final,
> > +   bool

Re: [Intel-gfx] [PATCH 41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB submission

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:38PM -0700, Matthew Brost wrote:
> Certain VMA functions in the execbuf IOCTL only need to be called on
> first or last BB of a multi-BB submission. eb_relocate() on the first

eb_relocate should be outright disallowed on multi lrc execbuf ioctl.
There's no users of that left, and it does substantially simplify the
entire locking problem if we outright disallow that.

> and eb_release_vmas() on the last. Doing so will save CPU / GPU cycles.

Yah for our mix of pin/unpin vs dma_resv_lock/unlock. Now with the current
unpin design this move is ok, but we want/need to switch vma over to
dma_resv_lock. And then it gets really nasty, because you run into a ton
of problems.

To more I read this the less I like this :-/
-Daniel

> 
> Signed-off-by: Matthew Brost 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 127 +++---
>  .../i915/gem/selftests/i915_gem_execbuffer.c  |  14 +-
>  2 files changed, 83 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index ecdb583cc2eb..70784779872a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -270,7 +270,7 @@ struct i915_execbuffer {
>   /** list of vma that have execobj.relocation_count */
>   struct list_head relocs;
>  
> - struct i915_gem_ww_ctx ww;
> + struct i915_gem_ww_ctx *ww;
>  
>   /**
>* Track the most recently used object for relocations, as we
> @@ -448,7 +448,7 @@ eb_pin_vma(struct i915_execbuffer *eb,
>   pin_flags |= PIN_GLOBAL;
>  
>   /* Attempt to reuse the current location if available */
> - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, pin_flags);
> + err = i915_vma_pin_ww(vma, eb->ww, 0, 0, pin_flags);
>   if (err == -EDEADLK)
>   return err;
>  
> @@ -457,11 +457,11 @@ eb_pin_vma(struct i915_execbuffer *eb,
>   return err;
>  
>   /* Failing that pick any _free_ space if suitable */
> - err = i915_vma_pin_ww(vma, &eb->ww,
> -  entry->pad_to_size,
> -  entry->alignment,
> -  eb_pin_flags(entry, ev->flags) |
> -  PIN_USER | PIN_NOEVICT);
> + err = i915_vma_pin_ww(vma, eb->ww,
> +   entry->pad_to_size,
> +   entry->alignment,
> +   eb_pin_flags(entry, ev->flags) |
> +   PIN_USER | PIN_NOEVICT);
>   if (unlikely(err))
>   return err;
>   }
> @@ -643,9 +643,9 @@ static int eb_reserve_vma(struct i915_execbuffer *eb,
>   return err;
>   }
>  
> - err = i915_vma_pin_ww(vma, &eb->ww,
> -entry->pad_to_size, entry->alignment,
> -eb_pin_flags(entry, ev->flags) | pin_flags);
> + err = i915_vma_pin_ww(vma, eb->ww,
> +   entry->pad_to_size, entry->alignment,
> +   eb_pin_flags(entry, ev->flags) | pin_flags);
>   if (err)
>   return err;
>  
> @@ -940,7 +940,7 @@ static int eb_lock_vmas(struct i915_execbuffer *eb)
>   struct eb_vma *ev = &eb->vma[i];
>   struct i915_vma *vma = ev->vma;
>  
> - err = i915_gem_object_lock(vma->obj, &eb->ww);
> + err = i915_gem_object_lock(vma->obj, eb->ww);
>   if (err)
>   return err;
>   }
> @@ -1020,12 +1020,13 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned 
> long handle)
>   }
>  }
>  
> -static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
> +static void eb_release_vmas(struct i915_execbuffer *eb, bool final,
> + bool unreserve)
>  {
>   const unsigned int count = eb->buffer_count;
>   unsigned int i;
>  
> - for (i = 0; i < count; i++) {
> + for (i = 0; unreserve && i < count; i++) {
>   struct eb_vma *ev = &eb->vma[i];
>   struct i915_vma *vma = ev->vma;
>  
> @@ -1237,7 +1238,7 @@ static void *reloc_iomap(struct drm_i915_gem_object 
> *obj,
>   if (err)
>   return ERR_PTR(err);
>  
> - vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0,
> + vma = i915_gem_object_ggtt_pin_ww(obj, eb->ww, NULL, 0, 0,
> PIN_MAPPABLE |
> PIN_NONBLOCK /* NOWARN */ |
> PIN_NOEVICT);
> @@ -1361,7 +1362,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
>   }
>   eb->reloc_pool = NULL;
>  
> - err = i915_gem_object_lock(pool->obj, &eb->ww);
> + err = i

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon

On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > > wrote:
> > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with 
> > > > > > > > > it went
> > > > > > > > > the memory type setting required for the non-coherent masters 
> > > > > > > > > to use
> > > > > > > > > system cache. Now that system cache support for GPU is added, 
> > > > > > > > > we will
> > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > > > cached.
> > > > > > > > > Without this, the system cache lines are not allocated for 
> > > > > > > > > GPU.
> > > > > > > > >
> > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > IOMMU_LLC,
> > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > >
> > > > > > > > Thank you for the patchset! Are you planning to refresh it, as 
> > > > > > > > it does
> > > > > > > > not apply anymore?
> > > > > > > >
> > > > > > >
> > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > needed, then
> > > > > > > I can repost the patch.
> > > > > >
> > > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > > adding
> > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. 
> > > > > > That
> > > > > > can't be right.
> > > > > >
> > > > >
> > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > doesn't really care about the smmu pgtable attributes..
> > > >
> > > > If the CPU accesses a shared buffer with different attributes to those 
> > > > which
> > > > the device is using then you fall into the "mismatched memory 
> > > > attributes"
> > > > part of the Arm architecture. It's reasonably unforgiving (you should 
> > > > go and
> > > > read it) and in some cases can apply to speculative accesses as well, 
> > > > but
> > > > the end result is typically loss of coherency.
> > >
> > > Ok, I might have a few other sections to read first to decipher the
> > > terminology..
> > >
> > > But my understanding of LLC is that it looks just like system memory
> > > to the CPU and GPU (I think that would make it "the point of
> > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > invisible from the point of view of different CPU mapping options?
> >
> > You could certainly build a system where mismatched attributes don't cause
> > loss of coherence, but as it's not guaranteed by the architecture and the
> > changes proposed here affect APIs which are exposed across SoCs, then I
> > don't think it helps much.
> >
> 
> Hmm, the description of the new mapping flag is that it applies only
> to transparent outer level cache:
> 
> +/*
> + * Non-coherent masters can use this page protection flag to set cacheable
> + * memory attributes for only a transparent outer level of cache, also known 
> as
> + * the last-level or system cache.
> + */
> +#define IOMMU_LLC  (1 << 6)
> 
> But I suppose we could call it instead IOMMU_QCOM_LLC or something
> like that to make it more clear that it is not necessarily something
> that would work with a different outer level cache implementation?

... or we could just deal with the problem so that other people can reuse
the code. I haven't really understood the reluctance to solve this properly.

Am I missing some reason this isn't solvable?

Will

Re: [Intel-gfx] [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 04:39:48PM +, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 06:32:42PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:20PM -0700, Matthew Brost wrote:
> > > The GuC must receive requests in the order submitted for contexts in a
> > > parent-child relationship to function correctly. To ensure this, insert
> > > a submit fence between the current request and last request submitted
> > > for requests / contexts in a parent child relationship. This is
> > > conceptually similar to a single timeline.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > Cc: John Harrison 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
> > >  drivers/gpu/drm/i915/gt/intel_context.h   |   5 +
> > >  drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +-
> > >  drivers/gpu/drm/i915/i915_request.c   | 120 ++
> > >  5 files changed, 105 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index bb4c14656067..98ef2d0f7a39 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -487,6 +487,8 @@ void intel_context_fini(struct intel_context *ce)
> > >  {
> > >   struct intel_context *child, *next;
> > >  
> > > + if (ce->last_rq)
> > > + i915_request_put(ce->last_rq);
> > >   if (ce->timeline)
> > >   intel_timeline_put(ce->timeline);
> > >   i915_vm_put(ce->vm);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index 7ce3b3d2edb7..a302599e436a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -60,6 +60,11 @@ intel_context_to_parent(struct intel_context *ce)
> > >   return intel_context_is_child(ce) ? ce->parent : ce;
> > >  }
> > >  
> > > +static inline bool intel_context_is_parallel(struct intel_context *ce)
> > > +{
> > > + return intel_context_is_child(ce) || intel_context_is_parent(ce);
> > > +}
> > > +
> > >  void intel_context_bind_parent_child(struct intel_context *parent,
> > >struct intel_context *child);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > index 9665cb31bab0..f4fc81f64921 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > @@ -225,6 +225,9 @@ struct intel_context {
> > >*/
> > >   u8 guc_prio;
> > >   u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> > > +
> > > + /* Last request submitted on a parent */
> > > + struct i915_request *last_rq;
> > >  };
> > >  
> > >  #endif /* __INTEL_CONTEXT_TYPES__ */
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index d1d4a1e59e8d..1cb382f7d79d 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -820,8 +820,7 @@ static inline int rq_prio(const struct i915_request 
> > > *rq)
> > >  
> > >  static inline bool is_multi_lrc_rq(struct i915_request *rq)
> > >  {
> > > - return intel_context_is_child(rq->context) ||
> > > - intel_context_is_parent(rq->context);
> > > + return intel_context_is_parallel(rq->context);
> > >  }
> > >  
> > >  /*
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > > b/drivers/gpu/drm/i915/i915_request.c
> > > index ce446716d092..2e51c8999088 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
> > >   return ret;
> > >  }
> > >  
> > > +static inline bool is_parallel_rq(struct i915_request *rq)
> > > +{
> > > + return intel_context_is_parallel(rq->context);
> > > +}
> > > +
> > > +static inline struct intel_context *request_to_parent(struct 
> > > i915_request *rq)
> > > +{
> > > + return intel_context_to_parent(rq->context);
> > > +}
> > > +
> > >  static struct i915_request *
> > > -__i915_request_add_to_timeline(struct i915_request *rq)
> > > +__i915_request_ensure_parallel_ordering(struct i915_request *rq,
> > > + struct intel_timeline *timeline)
> > >  {
> > > - struct intel_timeline *timeline = i915_request_timeline(rq);
> > >   struct i915_request *prev;
> > >  
> > > - /*
> > > -  * Dependency tracking and request ordering along the timeline
> > > -  * is special cased so that we can eliminate redundant ordering
> > > -  * operations while building the request (we know that the timeline
> > > -  * itself is ordered, and here we guarantee it).
> > > -  *
> > > -  * As we know we will need to emit tracking along the timeline,
> > > -  * we embed the hoo

Re: [PATCH 40/46] drm/i915: Multi-batch execbuffer2

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:37PM -0700, Matthew Brost wrote:
> For contexts with width set to two or more, we add a mode to execbuf2
> which implies there are N batch buffers in the buffer list, each of
> which will be sent to one of the engines from the engine map array
> (I915_CONTEXT_PARAM_ENGINES, I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT).
> 
> Those N batches can either be first N, or last N objects in the list as
> controlled by the existing execbuffer2 flag.
> 
> The N batches will be submitted to consecutive engines from the previously
> configured allowed engine array starting at index 0.
> 
> Input and output fences are fully supported, with the latter getting
> signalled when all batch buffers have completed.
> 
> Last, it isn't safe for subsequent batches to touch any objects written
> to by a multi-BB submission until all the batches in that submission
> complete. As such all batches in a multi-BB submission must be combined
> into a single composite fence and put into the dma reseveration excl
> fence slot.
> 
> Suggested-by: Tvrtko Ursulin 
> Signed-off-by: Matthew Brost 

So either I've missed something, or this has the exact deadlock issue as
the old submit fence, except it's all internally in the kmd.

Also, this is bad news (if I'm right about what's going on here).

- Between each batch submission we drop the dma_resv_locks on the objects.
  This can currently even happen due to relocations within a submission,
  but since we don't allow relocations on platforms with parallel
  submit/guc scheduler, this could be worked around.

- When the buffer is unlocked someone else could step in and do exactly
  what you say is not allowed, namely touch the object.

- The indivual batch fences won't completely until the last one has
  finished, leading to a deadlock which might or might not get resolved by
  gpu reset code. Since the deadlock is on the submission side I'm
  assuming the answer is "it won't be resolved by gpu reset", but maybe
  you do have a "I'm stuck for too long, let's ragequit" timer in your
  state machine somewhere. Old bonded submit would be rescued by the
  hangcheck we readded at least because there it's all completely
  free-floating requests.

- ttm on dgpu makes this all substantially worse.

The fundamental fix is still to build up a single i915_request, go through
the execbuf flow once, and then split things up again in the backend. That
would also mean all your prep work to pull execbuf prep step out of
do_execbuf() is a pure distraction.

I'm not yet fully understanding all the ordering rules drm/sched has, but
I don't think it will be any happier about this kind of submission model.

tldr; what do?

Cheers, Daniel
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 262 +++---
>  drivers/gpu/drm/i915/gt/intel_context.c   |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>  drivers/gpu/drm/i915/i915_vma.c   |  13 +-
>  drivers/gpu/drm/i915/i915_vma.h   |  16 +-
>  5 files changed, 266 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b6143973ac67..ecdb583cc2eb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -252,6 +252,9 @@ struct i915_execbuffer {
>   struct eb_vma *batch; /** identity of the batch obj/vma */
>   struct i915_vma *trampoline; /** trampoline used for chaining */
>  
> + /** used for excl fence in dma_resv objects when > 1 BB submitted */
> + struct dma_fence *composite_fence;
> +
>   /* batch_index in vma list */
>   unsigned int batch_index;
>  
> @@ -367,11 +370,6 @@ static int eb_create(struct i915_execbuffer *eb)
>   eb->lut_size = -eb->buffer_count;
>   }
>  
> - if (eb->args->flags & I915_EXEC_BATCH_FIRST)
> - eb->batch_index = 0;
> - else
> - eb->batch_index = eb->args->buffer_count - 1;
> -
>   return 0;
>  }
>  
> @@ -2241,7 +2239,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
>   return err;
>  }
>  
> -static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
> +static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first, bool last)
>  {
>   const unsigned int count = eb->buffer_count;
>   unsigned int i = count;
> @@ -2289,8 +2287,16 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb, 
> bool first)
>   }
>  
>   if (err == 0)
> - err = i915_vma_move_to_active(vma, eb->request,
> -   flags | 
> __EXEC_OBJECT_NO_RESERVE);
> + err = _i915_vma_move_to_active(vma, eb->request,
> +flags | 
> __EXEC_OBJECT_NO_RESERVE,
> +!last ?
> +

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>
> On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > >
> > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
> > > > > > > > flag")
> > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it 
> > > > > > > > went
> > > > > > > > the memory type setting required for the non-coherent masters 
> > > > > > > > to use
> > > > > > > > system cache. Now that system cache support for GPU is added, 
> > > > > > > > we will
> > > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > > cached.
> > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > >
> > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > IOMMU_LLC,
> > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > and makes GPU the user of this protection flag.
> > > > > > >
> > > > > > > Thank you for the patchset! Are you planning to refresh it, as it 
> > > > > > > does
> > > > > > > not apply anymore?
> > > > > > >
> > > > > >
> > > > > > I was waiting on Will's reply [1]. If there are no changes needed, 
> > > > > > then
> > > > > > I can repost the patch.
> > > > >
> > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > adding
> > > > > a new memory type to the SMMU which doesn't exist on the CPU side. 
> > > > > That
> > > > > can't be right.
> > > > >
> > > >
> > > > Just curious, and maybe this is a dumb question, but what is your
> > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > doesn't really care about the smmu pgtable attributes..
> > >
> > > If the CPU accesses a shared buffer with different attributes to those 
> > > which
> > > the device is using then you fall into the "mismatched memory attributes"
> > > part of the Arm architecture. It's reasonably unforgiving (you should go 
> > > and
> > > read it) and in some cases can apply to speculative accesses as well, but
> > > the end result is typically loss of coherency.
> >
> > Ok, I might have a few other sections to read first to decipher the
> > terminology..
> >
> > But my understanding of LLC is that it looks just like system memory
> > to the CPU and GPU (I think that would make it "the point of
> > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > invisible from the point of view of different CPU mapping options?
>
> You could certainly build a system where mismatched attributes don't cause
> loss of coherence, but as it's not guaranteed by the architecture and the
> changes proposed here affect APIs which are exposed across SoCs, then I
> don't think it helps much.
>

Hmm, the description of the new mapping flag is that it applies only
to transparent outer level cache:

+/*
+ * Non-coherent masters can use this page protection flag to set cacheable
+ * memory attributes for only a transparent outer level of cache, also known as
+ * the last-level or system cache.
+ */
+#define IOMMU_LLC  (1 << 6)

But I suppose we could call it instead IOMMU_QCOM_LLC or something
like that to make it more clear that it is not necessarily something
that would work with a different outer level cache implementation?

BR,
-R

Re: [PATCH 39/46] drm/i915: Force parallel contexts to use copy engine for reloc

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:36PM -0700, Matthew Brost wrote:
> Submitting to a subset of hardware contexts is not allowed, so use the
> copy engine for GPU relocations when using a parallel context.
> 
> Signed-off-by: Matthew Brost 

Luckily I just pushed the patches to delete all this, so you can too.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b224b28530d1..b6143973ac67 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1386,7 +1386,8 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
>   if (err)
>   goto err_unmap;
>  
> - if (engine == eb->context->engine) {
> + if (engine == eb->context->engine &&
> + !intel_context_is_parallel(eb->context)) {
>   rq = i915_request_create(eb->context);
>   } else {
>   struct intel_context *ce = eb->reloc_context;
> @@ -1483,7 +1484,8 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
>   if (eb_use_cmdparser(eb))
>   return ERR_PTR(-EWOULDBLOCK);
>  
> - if (!reloc_can_use_engine(engine)) {
> + if (!reloc_can_use_engine(engine) ||
> + intel_context_is_parallel(eb->context)) {
>   engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
>   if (!engine)
>   return ERR_PTR(-ENODEV);
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship

2021-08-09 Thread Matthew Brost

On Mon, Aug 09, 2021 at 06:32:42PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:20PM -0700, Matthew Brost wrote:
> > The GuC must receive requests in the order submitted for contexts in a
> > parent-child relationship to function correctly. To ensure this, insert
> > a submit fence between the current request and last request submitted
> > for requests / contexts in a parent child relationship. This is
> > conceptually similar to a single timeline.
> > 
> > Signed-off-by: Matthew Brost 
> > Cc: John Harrison 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
> >  drivers/gpu/drm/i915/gt/intel_context.h   |   5 +
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +-
> >  drivers/gpu/drm/i915/i915_request.c   | 120 ++
> >  5 files changed, 105 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index bb4c14656067..98ef2d0f7a39 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -487,6 +487,8 @@ void intel_context_fini(struct intel_context *ce)
> >  {
> > struct intel_context *child, *next;
> >  
> > +   if (ce->last_rq)
> > +   i915_request_put(ce->last_rq);
> > if (ce->timeline)
> > intel_timeline_put(ce->timeline);
> > i915_vm_put(ce->vm);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 7ce3b3d2edb7..a302599e436a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -60,6 +60,11 @@ intel_context_to_parent(struct intel_context *ce)
> > return intel_context_is_child(ce) ? ce->parent : ce;
> >  }
> >  
> > +static inline bool intel_context_is_parallel(struct intel_context *ce)
> > +{
> > +   return intel_context_is_child(ce) || intel_context_is_parent(ce);
> > +}
> > +
> >  void intel_context_bind_parent_child(struct intel_context *parent,
> >  struct intel_context *child);
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 9665cb31bab0..f4fc81f64921 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -225,6 +225,9 @@ struct intel_context {
> >  */
> > u8 guc_prio;
> > u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> > +
> > +   /* Last request submitted on a parent */
> > +   struct i915_request *last_rq;
> >  };
> >  
> >  #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index d1d4a1e59e8d..1cb382f7d79d 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -820,8 +820,7 @@ static inline int rq_prio(const struct i915_request *rq)
> >  
> >  static inline bool is_multi_lrc_rq(struct i915_request *rq)
> >  {
> > -   return intel_context_is_child(rq->context) ||
> > -   intel_context_is_parent(rq->context);
> > +   return intel_context_is_parallel(rq->context);
> >  }
> >  
> >  /*
> > diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > b/drivers/gpu/drm/i915/i915_request.c
> > index ce446716d092..2e51c8999088 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
> > return ret;
> >  }
> >  
> > +static inline bool is_parallel_rq(struct i915_request *rq)
> > +{
> > +   return intel_context_is_parallel(rq->context);
> > +}
> > +
> > +static inline struct intel_context *request_to_parent(struct i915_request 
> > *rq)
> > +{
> > +   return intel_context_to_parent(rq->context);
> > +}
> > +
> >  static struct i915_request *
> > -__i915_request_add_to_timeline(struct i915_request *rq)
> > +__i915_request_ensure_parallel_ordering(struct i915_request *rq,
> > +   struct intel_timeline *timeline)
> >  {
> > -   struct intel_timeline *timeline = i915_request_timeline(rq);
> > struct i915_request *prev;
> >  
> > -   /*
> > -* Dependency tracking and request ordering along the timeline
> > -* is special cased so that we can eliminate redundant ordering
> > -* operations while building the request (we know that the timeline
> > -* itself is ordered, and here we guarantee it).
> > -*
> > -* As we know we will need to emit tracking along the timeline,
> > -* we embed the hooks into our request struct -- at the cost of
> > -* having to have specialised no-allocation interfaces (which will
> > -* be beneficial elsewhere).
> > -*
> > -* A second benefit to open-coding i915_request_await_request

Re: [PATCH 26/46] drm/i915: Connect UAPI to GuC multi-lrc interface

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:23PM -0700, Matthew Brost wrote:
> Introduce 'set parallel submit' extension to connect UAPI to GuC
> multi-lrc interface. Kernel doc in new uAPI should explain it all.
> 
> Cc: Tvrtko Ursulin 
> Signed-off-by: Matthew Brost 

UMD merge request link + igt patchwork link because this is uapi please.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 157 +-
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   8 +-
>  drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 111 +++--
>  include/uapi/drm/i915_drm.h   | 128 ++
>  9 files changed, 417 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index cff72679ad7c..2b0dd3ff4db8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -515,9 +515,149 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
> __user *base, void *data)
>   return 0;
>  }
>  
> +static int
> +set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user 
> *base,
> +   void *data)
> +{
> + struct i915_context_engines_parallel_submit __user *ext =
> + container_of_user(base, typeof(*ext), base);
> + const struct set_proto_ctx_engines *set = data;
> + struct drm_i915_private *i915 = set->i915;
> + u64 flags;
> + int err = 0, n, i, j;
> + u16 slot, width, num_siblings;
> + struct intel_engine_cs **siblings = NULL;
> + intel_engine_mask_t prev_mask;
> +
> + /* Disabling for now */
> + return -ENODEV;
> +
> + if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
> + return -ENODEV;
> +
> + if (get_user(slot, &ext->engine_index))
> + return -EFAULT;
> +
> + if (get_user(width, &ext->width))
> + return -EFAULT;
> +
> + if (get_user(num_siblings, &ext->num_siblings))
> + return -EFAULT;
> +
> + if (slot >= set->num_engines) {
> + drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
> + slot, set->num_engines);
> + return -EINVAL;
> + }
> +
> + if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
> + drm_dbg(&i915->drm,
> + "Invalid placement[%d], already occupied\n", slot);
> + return -EINVAL;
> + }
> +
> + if (get_user(flags, &ext->flags))
> + return -EFAULT;
> +
> + if (flags) {
> + drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
> + return -EINVAL;
> + }
> +
> + for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
> + err = check_user_mbz(&ext->mbz64[n]);
> + if (err)
> + return err;
> + }
> +
> + if (width < 2) {
> + drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
> + return -EINVAL;
> + }
> +
> + if (num_siblings < 1) {
> + drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
> + num_siblings);
> + return -EINVAL;
> + }
> +
> + siblings = kmalloc_array(num_siblings * width,
> +  sizeof(*siblings),
> +  GFP_KERNEL);
> + if (!siblings)
> + return -ENOMEM;
> +
> + /* Create contexts / engines */
> + for (i = 0; i < width; ++i) {
> + intel_engine_mask_t current_mask = 0;
> + struct i915_engine_class_instance prev_engine;
> +
> + for (j = 0; j < num_siblings; ++j) {
> + struct i915_engine_class_instance ci;
> +
> + n = i * num_siblings + j;
> + if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
> + err = -EFAULT;
> + goto out_err;
> + }
> +
> + siblings[n] =
> + intel_engine_lookup_user(i915, ci.engine_class,
> +  ci.engine_instance);
> + if (!siblings[n]) {
> + drm_dbg(&i915->drm,
> + "Invalid sibling[%d]: { class:%d, 
> inst:%d }\n",
> + n, ci.engine_class, ci.engine_instance);
> + err = -EINVAL;
> + goto out_err;
> + }
> +
> + if (n) {
> + if (prev_engine.engine_class !=
> + ci.engine_

Re: [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:22PM -0700, Matthew Brost wrote:
> Display the workqueue status in debugfs for GuC contexts that are in
> parent-child relationship.
> 
> Signed-off-by: Matthew Brost 
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 +--
>  1 file changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 30df1c8db491..44a7582c9aed 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -4527,31 +4527,53 @@ void intel_guc_submission_print_info(struct intel_guc 
> *guc,
>   gse_log_submission_info(guc->gse[i], p, i);
>  }
>  
> +static inline void guc_log_context(struct drm_printer *p,
> +struct intel_context *ce)
> +{
> + drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> + drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> + drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> +ce->ring->head,
> +ce->lrc_reg_state[CTX_RING_HEAD]);
> + drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> +ce->ring->tail,
> +ce->lrc_reg_state[CTX_RING_TAIL]);
> + drm_printf(p, "\t\tContext Pin Count: %u\n",
> +atomic_read(&ce->pin_count));
> + drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> +atomic_read(&ce->guc_id_ref));
> + drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> +atomic_read(&ce->guc_num_rq_not_ready));
> + drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> +ce->guc_state.sched_state,
> +atomic_read(&ce->guc_sched_state_no_lock));

It's all debugfs, but I think proper locking even there is good. It at
least reduces the confusion when the locking scheme is largely
undocumented. Also given how much we have rcu for everything would be good
to double-check all pointer dererences are properly protected.

> +}
> +
>  void intel_guc_submission_print_context_info(struct intel_guc *guc,
>struct drm_printer *p)
>  {
>   struct intel_context *ce;
>   unsigned long index;
>   xa_for_each(&guc->context_lookup, index, ce) {

xa_for_each doesn't provide any guarantees, so doesn't protect against
concurrent removeal or anything like that. We need to do better than that.
-Daniel

> - drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> - drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> - drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> -ce->ring->head,
> -ce->lrc_reg_state[CTX_RING_HEAD]);
> - drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> -ce->ring->tail,
> -ce->lrc_reg_state[CTX_RING_TAIL]);
> - drm_printf(p, "\t\tContext Pin Count: %u\n",
> -atomic_read(&ce->pin_count));
> - drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> -atomic_read(&ce->guc_id_ref));
> - drm_printf(p, "\t\tNumber Requests Not Ready: %u\n",
> -atomic_read(&ce->guc_num_rq_not_ready));
> - drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> -ce->guc_state.sched_state,
> -atomic_read(&ce->guc_sched_state_no_lock));
> + GEM_BUG_ON(intel_context_is_child(ce));
>  
> + guc_log_context(p, ce);
>   guc_log_context_priority(p, ce);
> +
> + if (intel_context_is_parent(ce)) {
> + struct guc_process_desc *desc = __get_process_desc(ce);
> + struct intel_context *child;
> +
> + drm_printf(p, "\t\tWQI Head: %u\n",
> +READ_ONCE(desc->head));
> + drm_printf(p, "\t\tWQI Tail: %u\n",
> +READ_ONCE(desc->tail));
> + drm_printf(p, "\t\tWQI Status: %u\n\n",
> +READ_ONCE(desc->wq_status));
> +
> + for_each_child(ce, child)
> + guc_log_context(p, child);
> + }
>   }
>  }
>  
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [Intel-gfx] [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:20PM -0700, Matthew Brost wrote:
> The GuC must receive requests in the order submitted for contexts in a
> parent-child relationship to function correctly. To ensure this, insert
> a submit fence between the current request and last request submitted
> for requests / contexts in a parent child relationship. This is
> conceptually similar to a single timeline.
> 
> Signed-off-by: Matthew Brost 
> Cc: John Harrison 
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
>  drivers/gpu/drm/i915/gt/intel_context.h   |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +-
>  drivers/gpu/drm/i915/i915_request.c   | 120 ++
>  5 files changed, 105 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index bb4c14656067..98ef2d0f7a39 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -487,6 +487,8 @@ void intel_context_fini(struct intel_context *ce)
>  {
>   struct intel_context *child, *next;
>  
> + if (ce->last_rq)
> + i915_request_put(ce->last_rq);
>   if (ce->timeline)
>   intel_timeline_put(ce->timeline);
>   i915_vm_put(ce->vm);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index 7ce3b3d2edb7..a302599e436a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -60,6 +60,11 @@ intel_context_to_parent(struct intel_context *ce)
>   return intel_context_is_child(ce) ? ce->parent : ce;
>  }
>  
> +static inline bool intel_context_is_parallel(struct intel_context *ce)
> +{
> + return intel_context_is_child(ce) || intel_context_is_parent(ce);
> +}
> +
>  void intel_context_bind_parent_child(struct intel_context *parent,
>struct intel_context *child);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 9665cb31bab0..f4fc81f64921 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -225,6 +225,9 @@ struct intel_context {
>*/
>   u8 guc_prio;
>   u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> +
> + /* Last request submitted on a parent */
> + struct i915_request *last_rq;
>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index d1d4a1e59e8d..1cb382f7d79d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -820,8 +820,7 @@ static inline int rq_prio(const struct i915_request *rq)
>  
>  static inline bool is_multi_lrc_rq(struct i915_request *rq)
>  {
> - return intel_context_is_child(rq->context) ||
> - intel_context_is_parent(rq->context);
> + return intel_context_is_parallel(rq->context);
>  }
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index ce446716d092..2e51c8999088 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
>   return ret;
>  }
>  
> +static inline bool is_parallel_rq(struct i915_request *rq)
> +{
> + return intel_context_is_parallel(rq->context);
> +}
> +
> +static inline struct intel_context *request_to_parent(struct i915_request 
> *rq)
> +{
> + return intel_context_to_parent(rq->context);
> +}
> +
>  static struct i915_request *
> -__i915_request_add_to_timeline(struct i915_request *rq)
> +__i915_request_ensure_parallel_ordering(struct i915_request *rq,
> + struct intel_timeline *timeline)
>  {
> - struct intel_timeline *timeline = i915_request_timeline(rq);
>   struct i915_request *prev;
>  
> - /*
> -  * Dependency tracking and request ordering along the timeline
> -  * is special cased so that we can eliminate redundant ordering
> -  * operations while building the request (we know that the timeline
> -  * itself is ordered, and here we guarantee it).
> -  *
> -  * As we know we will need to emit tracking along the timeline,
> -  * we embed the hooks into our request struct -- at the cost of
> -  * having to have specialised no-allocation interfaces (which will
> -  * be beneficial elsewhere).
> -  *
> -  * A second benefit to open-coding i915_request_await_request is
> -  * that we can apply a slight variant of the rules specialised
> -  * for timelines that jump between engines (such as virtual engines).
> -  * If we consider the case of virtual engine, we must emit a dma-fenc

Re: [Letux-kernel] [PATCH 8/8] drm/ingenic: Attach bridge chain to encoders

2021-08-09 Thread Paul Cercueil


Hi Nikolaus,

Le lun., août 9 2021 at 13:14:03 +0200, H. Nikolaus Schaller 
 a écrit :

Hi Paul,
quick feedback: our HDMI on top compiles fine after fixing 2 merge 
conflicts, but dos not yet work.
Will need some spare time with access to the CI20 board to research 
the issue, i.e. can not give feedback immediately.


Alright, no problem. I'll be back home in about 2 weeks and then I can 
test on my CI20 as well.


Cheers,
-Paul


BR and thanks,
Nikolaus

 Am 08.08.2021 um 21:12 schrieb H. Nikolaus Schaller 
:




 Am 08.08.2021 um 21:06 schrieb H. Nikolaus Schaller 
:




 Am 08.08.2021 um 21:04 schrieb Paul Cercueil 
:


 Hi Nikolaus,

 Le dim., août 8 2021 at 20:57:09 +0200, H. Nikolaus Schaller 
 a écrit :

 Hi Paul,
 all other patches apply cleanly but this one fails on top of 
v5.14-rc4.

 What base are you using?
 BR and thanks,
 Nikolaus


 The base is drm-misc (https://cgit.freedesktop.org/drm/drm-misc), 
branch drm-misc-next.


 Ok, fine!


 Contains 3 patches for drm/ingenic and after taking them first, I 
can apply the series.


 Again, BR and thanks,
 Nikolaus

 ___
 https://projects.goldelico.com/p/gta04-kernel/
 Letux-kernel mailing list
 letux-ker...@openphoenux.org
 http://lists.goldelico.com/mailman/listinfo.cgi/letux-kernel

Re: linux-next: Signed-off-by missing for commit in the drm-intel tree

2021-08-09 Thread Matt Roper

On Mon, Aug 09, 2021 at 04:05:59PM +0200, Daniel Vetter wrote:
> On Fri, Aug 06, 2021 at 09:36:56AM +0300, Joonas Lahtinen wrote:
> > Hi Matt,
> > 
> > Always use the dim tooling when applying patches, it will do the right
> > thing with regards to adding the S-o-b.
> 
> fd.o server rejects any pushes that haven't been done by dim, so how did
> this get through?

I definitely used dim for all of these patches, but I'm not sure how I
lost my s-o-b on this one.  Maybe when I edited the commit message after
'dim extract-tags' I accidentally deleted an extra line when I removed
the extract-tags marker?  It's the only patch where the line is missing,
so it's almost certainly human error on my part rather than something
dim did wrong.

> Matt, can you pls figure out and type up the patch to
> plug that hole?

Are you referring to a patch for dim here?  The i915 patch has already
landed, so we can't change its commit message now.

Matt

> 
> Thanks, Daniel
> 
> > 
> > Regards, Joonas
> > 
> > Quoting Stephen Rothwell (2021-07-15 07:18:54)
> > > Hi all,
> > > 
> > > Commit
> > > 
> > >   db47fe727e1f ("drm/i915/step: s/_revid_tbl/_revids")
> > > 
> > > is missing a Signed-off-by from its committer.
> > > 
> > > -- 
> > > Cheers,
> > > Stephen Rothwell
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Caleb Connolly





On 09/08/2021 17:12, Rob Clark wrote:

On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen  wrote:


On 8/8/2021 10:22 PM, Rob Clark wrote:

On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly  wrote:




On 07/08/2021 21:04, Rob Clark wrote:

On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
 wrote:


Hi Rob, Akhil,

On 29/07/2021 21:53, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
 wrote:




On 29/07/2021 21:24, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
 wrote:


Hi Rob,

I've done some more testing! It looks like before that patch ("drm/msm: Devfreq 
tuning") the GPU would never get above
the second frequency in the OPP table (342MHz) (at least, not in glxgears). 
With the patch applied it would more
aggressively jump up to the max frequency which seems to be unstable at the 
default regulator voltages.


*ohh*, yeah, ok, that would explain it


Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 0.988v (instead 
of the stock 0.516v) makes the GPU stable
at the higher frequencies.

Applying this patch reverts the behaviour, and the GPU never goes above 342MHz 
in glxgears, losing ~30% performance in
glxgear.

I think (?) that enabling CPR support would be the proper solution to this - 
that would ensure that the regulators run
at the voltage the hardware needs to be stable.

Is hacking the voltage higher (although ideally not quite that high) an 
acceptable short term solution until we have
CPR? Or would it be safer to just not make use of the higher frequencies on 
a630 for now?



tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
on CC and I added sboyd, maybe one of them knows better.

In the short term, removing the higher problematic OPPs from dts might
be a better option than this patch (which I'm dropping), since there
is nothing stopping other workloads from hitting higher OPPs.

Oh yeah that sounds like a more sensible workaround than mine .


I'm slightly curious why I didn't have problems at higher OPPs on my
c630 laptop (sdm850)

Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher clocks 
as is out of the factory.

Would it be best to drop the OPPs for all devices? Or just those affected? I 
guess it's possible another c630 might
crash where yours doesn't?


I've not heard any reports of similar issues from the handful of other
folks with c630's on #aarch64-laptops.. but I can't really say if that
is luck or not.

It looks like this affects at least the OnePlus 6 and PocoPhone F1, I've done 
some more poking and the following diff
seems to fix the stability issues completely, it seems the delay is required to 
let the update propagate.

This doesn't feel like the right fix, but hopefully it's enough to come up with 
a better solution than disabling the new
devfreq behaviour on a630.

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index d7cec7f0dde0..69e2a5e84dae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
dev_pm_opp *opp)
return;
}

+   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+   usleep_range(300, 500);
+




I am a bit confused. We don't define a power domain for gpu in dt,
correct? Then what exactly set_opp do here? Do you think this usleep is
what is helping here somehow to mask the issue?
The power domains (for cx and gx) are defined in the GMU DT, the OPPs in the GPU DT. For the sake of simplicity I'll 
refer to the lowest frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as the "min" state, and the 
highest frequency (71000) and OPP level (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in 
sdm845.dtsi under the gpu node.


The new devfreq behaviour unmasks what I think is a driver bug, it inadvertently puts much more strain on the GPU 
regulators than they usually get. With the new behaviour the GPU jumps from it's min state to the max state and back 
again extremely rapidly under workloads as small as refreshing UI. Where previously the GPU would rarely if ever go 
above 342MHz when interacting with the device, it now jumps between min and max many times per second.


If my understanding is correct, the current implementation of the GMU set freq 
is the following:
 - Get OPP for frequency to set
 - Push the frequency to the GMU - immediately updating the core clock
 - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds up somewhere in power management code and causes 
the gx regulator level to be updated


The regulator will then take some time to reach it's new voltage level and stabilise. I believe that rapid transitions 
between min and max state - in combination with the increased current load from the GPU core - lead to the regulator 
becoming unstable (e.g. when it's requested to transition from it's lowest to highest levels imm

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Rob Clark

On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen  wrote:
>
> On 8/8/2021 10:22 PM, Rob Clark wrote:
> > On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly  
> > wrote:
> >>
> >>
> >>
> >> On 07/08/2021 21:04, Rob Clark wrote:
> >>> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
> >>>  wrote:
> 
>  Hi Rob, Akhil,
> 
>  On 29/07/2021 21:53, Rob Clark wrote:
> > On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
> >  wrote:
> >>
> >>
> >>
> >> On 29/07/2021 21:24, Rob Clark wrote:
> >>> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
> >>>  wrote:
> 
>  Hi Rob,
> 
>  I've done some more testing! It looks like before that patch 
>  ("drm/msm: Devfreq tuning") the GPU would never get above
>  the second frequency in the OPP table (342MHz) (at least, not in 
>  glxgears). With the patch applied it would more
>  aggressively jump up to the max frequency which seems to be unstable 
>  at the default regulator voltages.
> >>>
> >>> *ohh*, yeah, ok, that would explain it
> >>>
>  Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 
>  0.988v (instead of the stock 0.516v) makes the GPU stable
>  at the higher frequencies.
> 
>  Applying this patch reverts the behaviour, and the GPU never goes 
>  above 342MHz in glxgears, losing ~30% performance in
>  glxgear.
> 
>  I think (?) that enabling CPR support would be the proper solution 
>  to this - that would ensure that the regulators run
>  at the voltage the hardware needs to be stable.
> 
>  Is hacking the voltage higher (although ideally not quite that high) 
>  an acceptable short term solution until we have
>  CPR? Or would it be safer to just not make use of the higher 
>  frequencies on a630 for now?
> 
> >>>
> >>> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
> >>> on CC and I added sboyd, maybe one of them knows better.
> >>>
> >>> In the short term, removing the higher problematic OPPs from dts might
> >>> be a better option than this patch (which I'm dropping), since there
> >>> is nothing stopping other workloads from hitting higher OPPs.
> >> Oh yeah that sounds like a more sensible workaround than mine .
> >>>
> >>> I'm slightly curious why I didn't have problems at higher OPPs on my
> >>> c630 laptop (sdm850)
> >> Perhaps you won the sillicon lottery - iirc sdm850 is binned for 
> >> higher clocks as is out of the factory.
> >>
> >> Would it be best to drop the OPPs for all devices? Or just those 
> >> affected? I guess it's possible another c630 might
> >> crash where yours doesn't?
> >
> > I've not heard any reports of similar issues from the handful of other
> > folks with c630's on #aarch64-laptops.. but I can't really say if that
> > is luck or not.
>  It looks like this affects at least the OnePlus 6 and PocoPhone F1, I've 
>  done some more poking and the following diff
>  seems to fix the stability issues completely, it seems the delay is 
>  required to let the update propagate.
> 
>  This doesn't feel like the right fix, but hopefully it's enough to come 
>  up with a better solution than disabling the new
>  devfreq behaviour on a630.
> 
>  diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
>  b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  index d7cec7f0dde0..69e2a5e84dae 100644
>  --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>  @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
>  dev_pm_opp *opp)
> return;
> }
> 
>  +   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>  +
>  +   usleep_range(300, 500);
>  +
> >>>
>
> I am a bit confused. We don't define a power domain for gpu in dt,
> correct? Then what exactly set_opp do here? Do you think this usleep is
> what is helping here somehow to mask the issue?

Hmm, I thought "opp-level = RPMH_REGULATOR_LEVEL_*" did *something*,
but tbh I'm not sure exactly what..

> I feel we should just leave the new dcvs feature (shall we call it NAP?)
> disabled for a630 (and 10ms devfreq interval), until this is root caused.

I suppose "NAP" is a reasonable name.

But I think that reverting to previous behavior would not be enough,
there is nothing stopping devfreq from jumping from min to max freq,
which AFAIU should be enough to trigger this.  I guess that there just
hasn't been enough testing with different game workloads on those
phones to trigger this.

That said, I haven't seen similar issues on my sdm850 laptop, where I
defn have triggered mix->max freq transitions.. I guess it would be
interesting to know if this issue could be reproduced on db8

Re: [PATCH] dma-buf: Fix a few typos in dma-buf documentation

2021-08-09 Thread Randy Dunlap


On 8/9/21 5:22 AM, Gal Pressman wrote:

Fix a few typos in the documentation:
- Remove an extraneous 'or'
- 'unpins' -> 'unpin'
- 'braket' -> 'bracket'
- 'mappinsg' -> 'mappings'
- 'fullfills' -> 'fulfills'

Signed-off-by: Gal Pressman 


Reviewed-by: Randy Dunlap 

Thanks.


---
  include/linux/dma-buf.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..772403352767 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -54,7 +54,7 @@ struct dma_buf_ops {
 * device), and otherwise need to fail the attach operation.
 *
 * The exporter should also in general check whether the current
-* allocation fullfills the DMA constraints of the new device. If this
+* allocation fulfills the DMA constraints of the new device. If this
 * is not the case, and the allocation cannot be moved, it should also
 * fail the attach operation.
 *
@@ -146,7 +146,7 @@ struct dma_buf_ops {
 *
 * Returns:
 *
-* A &sg_table scatter list of or the backing storage of the DMA buffer,
+* A &sg_table scatter list of the backing storage of the DMA buffer,
 * already mapped into the device address space of the &device attached
 * with the provided &dma_buf_attachment. The addresses and lengths in
 * the scatter list are PAGE_SIZE aligned.
@@ -168,7 +168,7 @@ struct dma_buf_ops {
 *
 * This is called by dma_buf_unmap_attachment() and should unmap and
 * release the &sg_table allocated in @map_dma_buf, and it is mandatory.
-* For static dma_buf handling this might also unpins the backing
+* For static dma_buf handling this might also unpin the backing
 * storage if this is the last mapping of the DMA buffer.
 */
void (*unmap_dma_buf)(struct dma_buf_attachment *,
@@ -237,7 +237,7 @@ struct dma_buf_ops {
 * This callback is used by the dma_buf_mmap() function
 *
 * Note that the mapping needs to be incoherent, userspace is expected
-* to braket CPU access using the DMA_BUF_IOCTL_SYNC interface.
+* to bracket CPU access using the DMA_BUF_IOCTL_SYNC interface.
 *
 * Because dma-buf buffers have invariant size over their lifetime, the
 * dma-buf core checks whether a vma is too large and rejects such
@@ -464,7 +464,7 @@ static inline bool dma_buf_is_dynamic(struct dma_buf 
*dmabuf)
  
  /**

   * dma_buf_attachment_is_dynamic - check if a DMA-buf attachment uses dynamic
- * mappinsg
+ * mappings
   * @attach: the DMA-buf attachment to check
   *
   * Returns true if a DMA-buf importer wants to call the map/unmap functions 
with




--
~Randy

Re: [PATCH 1/2] dt-bindings: add bindings for the Sharp LS060T1SX01 panel

2021-08-09 Thread Rob Herring

On Sun, 08 Aug 2021 06:50:52 +0300, Dmitry Baryshkov wrote:
> Add devicetree bindings for the Sharp LS060T1SX01 6.0" FullHD panel
> using NT35695 driver. This panel can be found i.e. in the Dragonboard
> Display Adapter bundle.
> 
> Signed-off-by: Dmitry Baryshkov 
> ---
>  .../display/panel/sharp,ls060t1sx01.yaml  | 51 +++
>  1 file changed, 51 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
./Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml: $id: 
relative path/filename doesn't match actual path or filename
expected: 
http://devicetree.org/schemas/display/panel/sharp,ls060t1sx01.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/panel/sharp,ls060t1sx01.yaml:
 duplicate '$id' value 
'http://devicetree.org/schemas/display/panel/sharp,ls043t1le01.yaml#'

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1514772

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

Re: [Intel-gfx] [PATCH 21/46] drm/i915/guc: Add guc_child_context_destroy

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:18PM -0700, Matthew Brost wrote:
> Since child contexts do not own the guc_ids or GuC context registration,
> child contexts can simply be freed on destroy. Add
> guc_child_context_destroy context operation to do this.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 2d8296bcc583..850edeff9230 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2828,6 +2828,13 @@ static void destroy_worker_func(struct work_struct *w)
>   intel_gt_pm_unpark_work_add(gt, destroy_worker);
>  }
>  
> +/* Future patches will use this function */
> +__maybe_unused

Pure bikeshed, but for something this small just squash it in with the
first user. This kinda does nothing alone.
-Daniel

> +static void guc_child_context_destroy(struct kref *kref)
> +{
> + __guc_context_destroy(container_of(kref, struct intel_context, ref));
> +}
> +
>  static void guc_context_destroy(struct kref *kref)
>  {
>   struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 20/46] drm/i915/guc: Add hang check to GuC submit engine

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:17PM -0700, Matthew Brost wrote:
> The heartbeat uses a single instance of a GuC submit engine (GSE) to do
> the hang check. As such if a different GSE's state machine hangs, the
> heartbeat cannot detect this hang. Add timer to each GSE which in turn
> can disable all submissions if it is hung.
> 
> Cc: John Harrison 
> Signed-off-by: Matthew Brost 
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++
>  .../i915/gt/uc/intel_guc_submission_types.h   |  3 ++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index afb9b4bb8971..2d8296bcc583 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -105,15 +105,21 @@ static bool tasklet_blocked(struct guc_submit_engine 
> *gse)
>   return test_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
>  }
>  
> +/* 2 seconds seems like a reasonable timeout waiting for a G2H */
> +#define MAX_TASKLET_BLOCKED_NS   20
>  static void set_tasklet_blocked(struct guc_submit_engine *gse)
>  {
>   lockdep_assert_held(&gse->sched_engine.lock);
> + hrtimer_start_range_ns(&gse->hang_timer,
> +ns_to_ktime(MAX_TASKLET_BLOCKED_NS), 0,
> +HRTIMER_MODE_REL_PINNED);
>   set_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);

So with drm/scheduler the reset handling is assumed to be
single-threaded, and there's quite complex rules around that. I've
recently worked with Boris Brezillion to clarify all this a bit and
improve docs. Does this all still work in that glorious future? Might be
good to at least sprinkle some comments/thoughts around in the commit
message about the envisaged future direction for all this stuff, to keep
people in the loop. Especially future people.

Ofc plan is still to just largely land all this.

Also: set_bit is an unordered atomic, which means you need barriers, which
meanes ... *insert the full rant about justifying/documenting lockless
algorithms from earlier *

But I think this all falls out with the removal of the guc-id allocation
scheme?
-Daniel

>  }
>  
>  static void __clr_tasklet_blocked(struct guc_submit_engine *gse)
>  {
>   lockdep_assert_held(&gse->sched_engine.lock);
> + hrtimer_cancel(&gse->hang_timer);
>   clear_bit(GSE_STATE_TASKLET_BLOCKED, &gse->flags);
>  }
>  
> @@ -1028,6 +1034,7 @@ static void disable_submission(struct intel_guc *guc)
>   if (__tasklet_is_enabled(&sched_engine->tasklet)) {
>   GEM_BUG_ON(!guc->ct.enabled);
>   __tasklet_disable_sync_once(&sched_engine->tasklet);
> + hrtimer_try_to_cancel(&guc->gse[i]->hang_timer);
>   sched_engine->tasklet.callback = NULL;
>   }
>   }
> @@ -3750,6 +3757,33 @@ static void guc_sched_engine_destroy(struct kref *kref)
>   kfree(gse);
>  }
>  
> +static enum hrtimer_restart gse_hang(struct hrtimer *hrtimer)
> +{
> + struct guc_submit_engine *gse =
> + container_of(hrtimer, struct guc_submit_engine, hang_timer);
> + struct intel_guc *guc = gse->sched_engine.private_data;
> +
> +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> + if (guc->gse_hang_expected)
> + drm_dbg(&guc_to_gt(guc)->i915->drm,
> + "GSE[%i] hung, disabling submission", gse->id);
> + else
> + drm_err(&guc_to_gt(guc)->i915->drm,
> + "GSE[%i] hung, disabling submission", gse->id);
> +#else
> + drm_err(&guc_to_gt(guc)->i915->drm,
> + "GSE[%i] hung, disabling submission", gse->id);
> +#endif
> +
> + /*
> +  * Tasklet not making forward progress, disable submission which in turn
> +  * will kick in the heartbeat to do a full GPU reset.
> +  */
> + disable_submission(guc);
> +
> + return HRTIMER_NORESTART;
> +}
> +
>  static void guc_submit_engine_init(struct intel_guc *guc,
>  struct guc_submit_engine *gse,
>  int id)
> @@ -3767,6 +3801,8 @@ static void guc_submit_engine_init(struct intel_guc 
> *guc,
>   sched_engine->retire_inflight_request_prio =
>   guc_retire_inflight_request_prio;
>   sched_engine->private_data = guc;
> + hrtimer_init(&gse->hang_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + gse->hang_timer.function = gse_hang;
>   gse->id = id;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> index a5933e07bdd2..eae2e9725ede 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> @@ -6,6 +6,8 @@
>  #ifndef _INTEL_GUC_SUBMISSION_TYPES_H_
>  #define _INTEL_GUC_SUBMISSION_TYPES_H_
>  
> +#include 
> +
>  #in

Re: [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:16PM -0700, Matthew Brost wrote:
> Assign contexts in parent-child relationship consecutive guc_ids. This
> is accomplished by partitioning guc_id space between ones that need to
> be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
> available guc_ids). The consecutive search is implemented via the bitmap
> API.
> 
> This is a precursor to the full GuC multi-lrc implementation but aligns
> to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
> when using the GuC multi-lrc interface.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_context.h   |   6 +
>  drivers/gpu/drm/i915/gt/intel_reset.c |   3 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|   7 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 222 --
>  .../i915/gt/uc/intel_guc_submission_types.h   |  10 +
>  5 files changed, 179 insertions(+), 69 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index c208691fc87d..7ce3b3d2edb7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -54,6 +54,12 @@ static inline bool intel_context_is_parent(struct 
> intel_context *ce)
>   return !!ce->guc_number_children;
>  }
>  
> +static inline struct intel_context *
> +intel_context_to_parent(struct intel_context *ce)
> +{
> + return intel_context_is_child(ce) ? ce->parent : ce;
> +}
> +
>  void intel_context_bind_parent_child(struct intel_context *parent,
>struct intel_context *child);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
> b/drivers/gpu/drm/i915/gt/intel_reset.c
> index ea763138197f..c3d4baa1b2b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -849,6 +849,7 @@ static void reset_finish(struct intel_gt *gt, 
> intel_engine_mask_t awake)
>  
>  static void nop_submit_request(struct i915_request *request)
>  {
> + struct intel_context *ce = intel_context_to_parent(request->context);
>   RQ_TRACE(request, "-EIO\n");
>  
>   /*
> @@ -857,7 +858,7 @@ static void nop_submit_request(struct i915_request 
> *request)
>* this for now.
>*/
>   if (intel_engine_uses_guc(request->engine))
> - intel_guc_decr_num_rq_not_ready(request->context);
> + intel_guc_decr_num_rq_not_ready(ce);
>  
>   request = i915_request_mark_eio(request);
>   if (request) {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index c0c60ccabfa4..30a0f364db8f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -24,6 +24,7 @@ struct __guc_ads_blob;
>  
>  enum {
>   GUC_SUBMIT_ENGINE_SINGLE_LRC,
> + GUC_SUBMIT_ENGINE_MULTI_LRC,
>   GUC_SUBMIT_ENGINE_MAX
>  };
>  
> @@ -59,8 +60,10 @@ struct intel_guc {
>   struct ida guc_ids;
>   u32 num_guc_ids;
>   u32 max_guc_ids;
> - struct list_head guc_id_list_no_ref;
> - struct list_head guc_id_list_unpinned;
> + unsigned long *guc_ids_bitmap;
> +#define MAX_GUC_ID_ORDER (order_base_2(MAX_ENGINE_INSTANCE + 1))
> + struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
> + struct list_head guc_id_list_unpinned[MAX_GUC_ID_ORDER + 1];

Random new global lists definitely need kerneldoc about what is on them,
how they're linked, what their lifetime rules are and what locks we're
holding.

Leaving this all to reviews to figure out, and worse, future readers of
your code, is not kind.

>   spinlock_t destroy_lock;/* protects list / worker */
>   struct list_head destroyed_contexts;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f23dd716723f..afb9b4bb8971 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -169,6 +169,15 @@ static void clr_guc_ids_exhausted(struct 
> guc_submit_engine *gse)
>   clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
>  }
>  
> +/*
> + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be 
> contiguous

I think it'd be good to put down the reason here for why. Is this a
requirement of the guc interface, or just an artifact of our current
implementation? In the latter case also explain what exactly the
contstraint is (but honestly I can't think of much reasons for that)
-Daniel

> + * and a different allocation algorithm is used (bitmap vs. ida). We believe 
> the
> + * number of multi-lrc contexts in use should be low and 1/16 should be
> + * sufficient. Minimum of 32 ids for multi-lrc.
> + */
> +#define NUMBER_MULTI_LRC_GUC_ID(guc) \
> + ((guc)->num_guc_ids / 16 > 32 ? (guc)->num_guc_ids / 16 : 32)
> +
>  /*
>   * Below is a set of functions which control the GuC schedul

Re: [Intel-gfx] [PATCH 16/46] drm/i915/guc: Implement GuC parent-child context pin / unpin functions

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:13PM -0700, Matthew Brost wrote:
> Implement GuC parent-child context pin / unpin functions in which in any
> contexts in the relationship are pinned all the contexts are pinned. The
> parent owns most of the pinning / unpinning process and the children
> direct any pins / unpins to the parent.
> 
> Patch implements a number of unused functions that will be connected
> later in the series.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c   | 187 --
>  drivers/gpu/drm/i915/gt/intel_context.h   |  43 +---
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |  25 ++-
>  drivers/gpu/drm/i915/gt/intel_lrc.c   |  26 +--
>  drivers/gpu/drm/i915/gt/intel_lrc.h   |   6 +-
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   5 +-
>  drivers/gpu/drm/i915/gt/mock_engine.c |   4 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 183 +++--
>  9 files changed, 371 insertions(+), 112 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index 8cb92b10b547..bb4c14656067 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -158,8 +158,8 @@ static void __ring_retire(struct intel_ring *ring)
>   intel_ring_unpin(ring);
>  }
>  
> -static int intel_context_pre_pin(struct intel_context *ce,
> -  struct i915_gem_ww_ctx *ww)
> +static int __intel_context_pre_pin(struct intel_context *ce,
> +struct i915_gem_ww_ctx *ww)
>  {
>   int err;
>  
> @@ -190,7 +190,7 @@ static int intel_context_pre_pin(struct intel_context *ce,
>   return err;
>  }
>  
> -static void intel_context_post_unpin(struct intel_context *ce)
> +static void __intel_context_post_unpin(struct intel_context *ce)
>  {
>   if (ce->state)
>   __context_unpin_state(ce->state);
> @@ -199,13 +199,85 @@ static void intel_context_post_unpin(struct 
> intel_context *ce)
>   __ring_retire(ce->ring);
>  }
>  
> -int __intel_context_do_pin_ww(struct intel_context *ce,
> -   struct i915_gem_ww_ctx *ww)
> +static int intel_context_pre_pin(struct intel_context *ce,
> +  struct i915_gem_ww_ctx *ww)
>  {
> - bool handoff = false;
> - void *vaddr;
> + struct intel_context *child;
> + int err, i = 0;
> +
> + GEM_BUG_ON(intel_context_is_child(ce));
> +
> + for_each_child(ce, child) {
> + err = __intel_context_pre_pin(child, ww);
> + if (unlikely(err))
> + goto unwind;
> + ++i;
> + }
> +
> + err = __intel_context_pre_pin(ce, ww);
> + if (unlikely(err))
> + goto unwind;
> +
> + return 0;
> +
> +unwind:
> + for_each_child(ce, child) {
> + if (!i--)
> + break;
> + __intel_context_post_unpin(ce);
> + }
> +
> + return err;
> +}
> +
> +static void intel_context_post_unpin(struct intel_context *ce)
> +{
> + struct intel_context *child;
> +
> + GEM_BUG_ON(intel_context_is_child(ce));
> +
> + for_each_child(ce, child)
> + __intel_context_post_unpin(child);
> +
> + __intel_context_post_unpin(ce);
> +}
> +
> +static int __do_ww_lock(struct intel_context *ce,
> + struct i915_gem_ww_ctx *ww)
> +{
> + int err = i915_gem_object_lock(ce->timeline->hwsp_ggtt->obj, ww);
> +
> + if (!err && ce->ring->vma->obj)
> + err = i915_gem_object_lock(ce->ring->vma->obj, ww);
> + if (!err && ce->state)
> + err = i915_gem_object_lock(ce->state->obj, ww);
> +
> + return err;
> +}
> +
> +static int do_ww_lock(struct intel_context *ce,
> +   struct i915_gem_ww_ctx *ww)
> +{
> + struct intel_context *child;
>   int err = 0;
>  
> + GEM_BUG_ON(intel_context_is_child(ce));
> +
> + for_each_child(ce, child) {
> + err = __do_ww_lock(child, ww);
> + if (unlikely(err))
> + return err;
> + }
> +
> + return __do_ww_lock(ce, ww);
> +}
> +
> +static int __intel_context_do_pin_ww(struct intel_context *ce,
> +  struct i915_gem_ww_ctx *ww)
> +{
> + bool handoff = false;
> + int err;
> +
>   if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
>   err = intel_context_alloc_state(ce);
>   if (err)
> @@ -217,14 +289,11 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
>* refcount for __intel_context_active(), which prevent a lock
>* inversion of ce->pin_mutex vs dma_resv_lock().
>*/
> + err = do_ww_lock(ce, ww);
> + if (err)
> + return err;
>  
> - err = i915_gem_object_lock(ce->timeline->hwsp_ggtt->obj, ww);
> - if (!err && ce->ring->vma-

[GIT PULL] mediatek drm fixes for 5.14

2021-08-09 Thread Chun-Kuang Hu

Hi, Dave & Daniel:

This includes:

1. Fix dpi bridge bug.
2. Fix cursor plane no update.

Regards,
Chun-Kuang.

The following changes since commit e73f0f0ee7541171d89f2e2491130c7771ba58d3:

  Linux 5.14-rc1 (2021-07-11 15:07:40 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux.git 
tags/mediatek-drm-fixes-5.14

for you to fetch changes up to 1a64a7aff8da352c9419de3d5c34343682916411:

  drm/mediatek: Fix cursor plane no update (2021-07-22 22:57:52 +0800)


Mediatek DRM Fixes for Linux 5.14

1. Fix dpi bridge bug.
2. Fix cursor plane no update.


Frank Wunderlich (1):
  drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check

Hsin-Yi Wang (1):
  drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge

jason-jh.lin (1):
  drm/mediatek: Fix cursor plane no update

 drivers/gpu/drm/mediatek/mtk_dpi.c   |  6 +++-
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c  |  3 --
 drivers/gpu/drm/mediatek/mtk_drm_plane.c | 60 ++--
 3 files changed, 39 insertions(+), 30 deletions(-)

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon

On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> >
> > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > >
> > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
> > > > > > > flag")
> > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it 
> > > > > > > went
> > > > > > > the memory type setting required for the non-coherent masters to 
> > > > > > > use
> > > > > > > system cache. Now that system cache support for GPU is added, we 
> > > > > > > will
> > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > cached.
> > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > >
> > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > IOMMU_LLC,
> > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > and makes GPU the user of this protection flag.
> > > > > >
> > > > > > Thank you for the patchset! Are you planning to refresh it, as it 
> > > > > > does
> > > > > > not apply anymore?
> > > > > >
> > > > >
> > > > > I was waiting on Will's reply [1]. If there are no changes needed, 
> > > > > then
> > > > > I can repost the patch.
> > > >
> > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > can't be right.
> > > >
> > >
> > > Just curious, and maybe this is a dumb question, but what is your
> > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > GPU device side (anything beyond the LLC) is pretty different and
> > > doesn't really care about the smmu pgtable attributes..
> >
> > If the CPU accesses a shared buffer with different attributes to those which
> > the device is using then you fall into the "mismatched memory attributes"
> > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > read it) and in some cases can apply to speculative accesses as well, but
> > the end result is typically loss of coherency.
> 
> Ok, I might have a few other sections to read first to decipher the
> terminology..
> 
> But my understanding of LLC is that it looks just like system memory
> to the CPU and GPU (I think that would make it "the point of
> coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> invisible from the point of view of different CPU mapping options?

You could certainly build a system where mismatched attributes don't cause
loss of coherence, but as it's not guaranteed by the architecture and the
changes proposed here affect APIs which are exposed across SoCs, then I
don't think it helps much.

Will

Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-08-09 Thread Akhil P Oommen


On 8/8/2021 10:22 PM, Rob Clark wrote:

On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly  wrote:




On 07/08/2021 21:04, Rob Clark wrote:

On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
 wrote:


Hi Rob, Akhil,

On 29/07/2021 21:53, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
 wrote:




On 29/07/2021 21:24, Rob Clark wrote:

On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
 wrote:


Hi Rob,

I've done some more testing! It looks like before that patch ("drm/msm: Devfreq 
tuning") the GPU would never get above
the second frequency in the OPP table (342MHz) (at least, not in glxgears). 
With the patch applied it would more
aggressively jump up to the max frequency which seems to be unstable at the 
default regulator voltages.


*ohh*, yeah, ok, that would explain it


Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 0.988v (instead 
of the stock 0.516v) makes the GPU stable
at the higher frequencies.

Applying this patch reverts the behaviour, and the GPU never goes above 342MHz 
in glxgears, losing ~30% performance in
glxgear.

I think (?) that enabling CPR support would be the proper solution to this - 
that would ensure that the regulators run
at the voltage the hardware needs to be stable.

Is hacking the voltage higher (although ideally not quite that high) an 
acceptable short term solution until we have
CPR? Or would it be safer to just not make use of the higher frequencies on 
a630 for now?



tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
on CC and I added sboyd, maybe one of them knows better.

In the short term, removing the higher problematic OPPs from dts might
be a better option than this patch (which I'm dropping), since there
is nothing stopping other workloads from hitting higher OPPs.

Oh yeah that sounds like a more sensible workaround than mine .


I'm slightly curious why I didn't have problems at higher OPPs on my
c630 laptop (sdm850)

Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher clocks 
as is out of the factory.

Would it be best to drop the OPPs for all devices? Or just those affected? I 
guess it's possible another c630 might
crash where yours doesn't?


I've not heard any reports of similar issues from the handful of other
folks with c630's on #aarch64-laptops.. but I can't really say if that
is luck or not.

It looks like this affects at least the OnePlus 6 and PocoPhone F1, I've done 
some more poking and the following diff
seems to fix the stability issues completely, it seems the delay is required to 
let the update propagate.

This doesn't feel like the right fix, but hopefully it's enough to come up with 
a better solution than disabling the new
devfreq behaviour on a630.

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index d7cec7f0dde0..69e2a5e84dae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
dev_pm_opp *opp)
   return;
   }

+   dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+   usleep_range(300, 500);
+




I am a bit confused. We don't define a power domain for gpu in dt, 
correct? Then what exactly set_opp do here? Do you think this usleep is 
what is helping here somehow to mask the issue?


I feel we should just leave the new dcvs feature (shall we call it NAP?) 
disabled for a630 (and 10ms devfreq interval), until this is root caused.



Hmm, this is going to be in the critical path on idle -> active
transition (ie. think response time to user-input).. so we defn don't
want to do this unconditionally..

If I understand the problem, we just want to limit how far we jump the
gpu freq in one go.. maybe deleting the lowest (and perhaps highest)
OPP would accomplish that?  Could that be done in the board(s)'s
toplevel dts files?

That would be a workaround, however I'd really like to avoid limiting 
performance as a solution if I can help it,
especially as the fix might just be "set the opp first, wait for it to apply, then 
set the core clock".

Is there a sensible way to get a callback from the opp notify chain? Or from 
rpmh directly? Or is this solution really
not the right way to go?


It does seem a bit strange to me that we are telling GMU to change
freq before calling dev_pm_opp_set_opp()..  if dev_pm_opp_set_opp() is
increasing voltage, it seems like you'd want to do that *before*
increasing freq (but reverse the order when decreasing freq).. But I'm
not an expert on the ways of the GMU..  maybe Akhil or Jordan knows
better how this is supposed to work.


For legacy gmu, we trigger DCVS using DCVS OOB which comes later in this 
function. But the order between regulator and clock which you mentioned 
is correct.




But the delay seems like papering something over, and I'm trying to go
in the other direction and reduce latency between user input and
pageflip..

BR,
-R



BR,
-R

Re: [PATCH v6 1/1] drm/mediatek: force hsa hbp hfp packets multiple of lanenum to avoid screen shift

2021-08-09 Thread Chun-Kuang Hu

Hi, Jitao:

Jitao Shi  於 2021年8月8日 週日 下午9:41寫道：
>
> The bridge chip ANX7625 requires the packets on lanes aligned at the end,
> or ANX7625 will shift the screen.

In anx7625_attach_dsi(), it call mipi_dsi_attach(), and it call into
mtk_dsi_host_attach().
I would like to pass this information from anx7623 driver to mtk_dsi
driver when attach.

Regards,
Chun-Kuang.

>
> Signed-off-by: Jitao Shi 
> ---
>  drivers/gpu/drm/mediatek/mtk_dsi.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
> b/drivers/gpu/drm/mediatek/mtk_dsi.c
> index ae403c67cbd9..033234d51e86 100644
> --- a/drivers/gpu/drm/mediatek/mtk_dsi.c
> +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
> @@ -194,6 +194,8 @@ struct mtk_dsi {
> struct clk *hs_clk;
>
> u32 data_rate;
> +   /* force dsi line end without dsi_null data */
> +   bool force_dsi_end_without_null;
>
> unsigned long mode_flags;
> enum mipi_dsi_pixel_format format;
> @@ -499,6 +501,13 @@ static void mtk_dsi_config_vdo_timing(struct mtk_dsi 
> *dsi)
> DRM_WARN("HFP + HBP less than d-phy, FPS will under 60Hz\n");
> }
>
> +   if (dsi->force_dsi_end_without_null) {
> +   horizontal_sync_active_byte = 
> roundup(horizontal_sync_active_byte, dsi->lanes) - 2;
> +   horizontal_frontporch_byte = 
> roundup(horizontal_frontporch_byte, dsi->lanes) - 2;
> +   horizontal_backporch_byte = 
> roundup(horizontal_backporch_byte, dsi->lanes) - 2;
> +   horizontal_backporch_byte -= (vm->hactive * dsi_tmp_buf_bpp + 
> 2) % dsi->lanes;
> +   }
> +
> writel(horizontal_sync_active_byte, dsi->regs + DSI_HSA_WC);
> writel(horizontal_backporch_byte, dsi->regs + DSI_HBP_WC);
> writel(horizontal_frontporch_byte, dsi->regs + DSI_HFP_WC);
> @@ -1095,6 +1104,10 @@ static int mtk_dsi_probe(struct platform_device *pdev)
> dsi->bridge.of_node = dev->of_node;
> dsi->bridge.type = DRM_MODE_CONNECTOR_DSI;
>
> +   if (dsi->next_bridge)
> +   dsi->force_dsi_end_without_null = 
> of_device_is_compatible(dsi->next_bridge->of_node,
> + 
> "analogix,anx7625");
> +
> drm_bridge_add(&dsi->bridge);
>
> ret = component_add(&pdev->dev, &mtk_dsi_component_ops);
> --
> 2.25.1

Re: [Intel-gfx] [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship

2021-08-09 Thread Daniel Vetter

On Mon, Aug 09, 2021 at 04:37:55PM +0200, Daniel Vetter wrote:
> On Tue, Aug 03, 2021 at 03:29:12PM -0700, Matthew Brost wrote:
> > Introduce context parent-child relationship. Once this relationship is
> > created all pinning / unpinning operations are directed to the parent
> > context. The parent context is responsible for pinning all of its'
> > children and itself.
> > 
> > This is a precursor to the full GuC multi-lrc implementation but aligns
> > to how GuC mutli-lrc interface is defined - a single H2G is used
> > register / deregister all of the contexts simultaneously.
> > 
> > Subsequent patches in the series will implement the pinning / unpinning
> > operations for parent / child contexts.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c   | 29 +++
> >  drivers/gpu/drm/i915/gt/intel_context.h   | 18 
> >  drivers/gpu/drm/i915/gt/intel_context_types.h | 12 
> >  3 files changed, 59 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 745e84c72c90..8cb92b10b547 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -395,6 +395,8 @@ intel_context_init(struct intel_context *ce, struct 
> > intel_engine_cs *engine)
> > spin_lock_init(&ce->guc_state.lock);
> > INIT_LIST_HEAD(&ce->guc_state.fences);
> >  
> > +   INIT_LIST_HEAD(&ce->guc_child_list);
> > +
> > spin_lock_init(&ce->guc_active.lock);
> > INIT_LIST_HEAD(&ce->guc_active.requests);
> >  
> > @@ -414,10 +416,17 @@ intel_context_init(struct intel_context *ce, struct 
> > intel_engine_cs *engine)
> >  
> >  void intel_context_fini(struct intel_context *ce)
> >  {
> > +   struct intel_context *child, *next;
> > +
> > if (ce->timeline)
> > intel_timeline_put(ce->timeline);
> > i915_vm_put(ce->vm);
> >  
> > +   /* Need to put the creation ref for the children */
> > +   if (intel_context_is_parent(ce))
> > +   for_each_child_safe(ce, child, next)
> > +   intel_context_put(child);
> > +
> > mutex_destroy(&ce->pin_mutex);
> > i915_active_fini(&ce->active);
> >  }
> > @@ -533,6 +542,26 @@ struct i915_request 
> > *intel_context_find_active_request(struct intel_context *ce)
> > return active;
> >  }
> >  
> > +void intel_context_bind_parent_child(struct intel_context *parent,
> > +struct intel_context *child)
> > +{
> > +   /*
> > +* Callers responsibility to validate that this function is used
> > +* correctly but we use GEM_BUG_ON here ensure that they do.
> > +*/
> > +   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
> > +   GEM_BUG_ON(intel_context_is_pinned(parent));
> > +   GEM_BUG_ON(intel_context_is_child(parent));
> > +   GEM_BUG_ON(intel_context_is_pinned(child));
> > +   GEM_BUG_ON(intel_context_is_child(child));
> > +   GEM_BUG_ON(intel_context_is_parent(child));
> > +
> > +   parent->guc_number_children++;
> > +   list_add_tail(&child->guc_child_link,
> > + &parent->guc_child_list);
> > +   child->parent = parent;
> > +}
> > +
> >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >  #include "selftest_context.c"
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index c41098950746..ad6ce5ac4824 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -44,6 +44,24 @@ void intel_context_free(struct intel_context *ce);
> >  int intel_context_reconfigure_sseu(struct intel_context *ce,
> >const struct intel_sseu sseu);
> >  
> > +static inline bool intel_context_is_child(struct intel_context *ce)
> > +{
> > +   return !!ce->parent;
> > +}
> > +
> > +static inline bool intel_context_is_parent(struct intel_context *ce)
> > +{
> > +   return !!ce->guc_number_children;
> > +}
> > +
> > +void intel_context_bind_parent_child(struct intel_context *parent,
> > +struct intel_context *child);
> > +
> > +#define for_each_child(parent, ce)\
> > +   list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
> > +#define for_each_child_safe(parent, ce, cn)\
> > +   list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, 
> > guc_child_link)
> > +
> >  /**
> >   * intel_context_lock_pinned - Stablises the 'pinned' status of the HW 
> > context
> >   * @ce - the context
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 2df79ba39867..66b22b370a72 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -202,6 +202,18 @@ struct intel_context {
> > /* GuC context blocked fence */
> > struct i915_sw_fence guc_blocked;
> >  
> > +   /* Head of children list or link

Re: [Intel-gfx] [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:12PM -0700, Matthew Brost wrote:
> Introduce context parent-child relationship. Once this relationship is
> created all pinning / unpinning operations are directed to the parent
> context. The parent context is responsible for pinning all of its'
> children and itself.
> 
> This is a precursor to the full GuC multi-lrc implementation but aligns
> to how GuC mutli-lrc interface is defined - a single H2G is used
> register / deregister all of the contexts simultaneously.
> 
> Subsequent patches in the series will implement the pinning / unpinning
> operations for parent / child contexts.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c   | 29 +++
>  drivers/gpu/drm/i915/gt/intel_context.h   | 18 
>  drivers/gpu/drm/i915/gt/intel_context_types.h | 12 
>  3 files changed, 59 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index 745e84c72c90..8cb92b10b547 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -395,6 +395,8 @@ intel_context_init(struct intel_context *ce, struct 
> intel_engine_cs *engine)
>   spin_lock_init(&ce->guc_state.lock);
>   INIT_LIST_HEAD(&ce->guc_state.fences);
>  
> + INIT_LIST_HEAD(&ce->guc_child_list);
> +
>   spin_lock_init(&ce->guc_active.lock);
>   INIT_LIST_HEAD(&ce->guc_active.requests);
>  
> @@ -414,10 +416,17 @@ intel_context_init(struct intel_context *ce, struct 
> intel_engine_cs *engine)
>  
>  void intel_context_fini(struct intel_context *ce)
>  {
> + struct intel_context *child, *next;
> +
>   if (ce->timeline)
>   intel_timeline_put(ce->timeline);
>   i915_vm_put(ce->vm);
>  
> + /* Need to put the creation ref for the children */
> + if (intel_context_is_parent(ce))
> + for_each_child_safe(ce, child, next)
> + intel_context_put(child);
> +
>   mutex_destroy(&ce->pin_mutex);
>   i915_active_fini(&ce->active);
>  }
> @@ -533,6 +542,26 @@ struct i915_request 
> *intel_context_find_active_request(struct intel_context *ce)
>   return active;
>  }
>  
> +void intel_context_bind_parent_child(struct intel_context *parent,
> +  struct intel_context *child)
> +{
> + /*
> +  * Callers responsibility to validate that this function is used
> +  * correctly but we use GEM_BUG_ON here ensure that they do.
> +  */
> + GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
> + GEM_BUG_ON(intel_context_is_pinned(parent));
> + GEM_BUG_ON(intel_context_is_child(parent));
> + GEM_BUG_ON(intel_context_is_pinned(child));
> + GEM_BUG_ON(intel_context_is_child(child));
> + GEM_BUG_ON(intel_context_is_parent(child));
> +
> + parent->guc_number_children++;
> + list_add_tail(&child->guc_child_link,
> +   &parent->guc_child_list);
> + child->parent = parent;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftest_context.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index c41098950746..ad6ce5ac4824 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -44,6 +44,24 @@ void intel_context_free(struct intel_context *ce);
>  int intel_context_reconfigure_sseu(struct intel_context *ce,
>  const struct intel_sseu sseu);
>  
> +static inline bool intel_context_is_child(struct intel_context *ce)
> +{
> + return !!ce->parent;
> +}
> +
> +static inline bool intel_context_is_parent(struct intel_context *ce)
> +{
> + return !!ce->guc_number_children;
> +}
> +
> +void intel_context_bind_parent_child(struct intel_context *parent,
> +  struct intel_context *child);
> +
> +#define for_each_child(parent, ce)\
> + list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
> +#define for_each_child_safe(parent, ce, cn)\
> + list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, 
> guc_child_link)
> +
>  /**
>   * intel_context_lock_pinned - Stablises the 'pinned' status of the HW 
> context
>   * @ce - the context
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 2df79ba39867..66b22b370a72 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -202,6 +202,18 @@ struct intel_context {
>   /* GuC context blocked fence */
>   struct i915_sw_fence guc_blocked;
>  
> + /* Head of children list or link in parent's children list */

Kerneldoc layout would be nice, plus explaining when exactly this is
set or the list empty (e.g. guch_child_list is empty if and only if
guc_number_children > 0 and parent == NULL).

Also mentionting t

Re: [PATCH v6 4/7] drm/mediatek: adjust to the alphabetic order for mediatek-drm

2021-08-09 Thread Chun-Kuang Hu

Hi, Jason:

jason-jh.lin  於 2021年8月6日 週五 上午4:52寫道：
>
> 1. Adjust to the alphabetic order for the define, function, struct
>and array in mediatek-drm driver
> 2. Remove the unsed define in mtk_drm_ddp_comp.c

Separate the 2nd part to another patch.

>
> Signed-off-by: jason-jh.lin 
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 180 +---
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |  22 +--
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c  |  76 -
>  3 files changed, 133 insertions(+), 145 deletions(-)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> index 75bc00e17fc4..328ee19f931e 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> @@ -20,50 +20,36 @@
>  #include "mtk_drm_ddp_comp.h"
>  #include "mtk_drm_crtc.h"
>
> -#define DISP_OD_EN 0x
> -#define DISP_OD_INTEN  0x0008
> -#define DISP_OD_INTSTA 0x000c
> -#define DISP_OD_CFG0x0020
> -#define DISP_OD_SIZE   0x0030
> -#define DISP_DITHER_5  0x0114
> -#define DISP_DITHER_7  0x011c
> -#define DISP_DITHER_15 0x013c
> -#define DISP_DITHER_16 0x0140
> -
> -#define DISP_REG_UFO_START 0x
> -
> -#define DISP_AAL_EN0x
> -#define DISP_AAL_SIZE  0x0030
> +#define DISP_REG_AAL_EN0x
> +#define AAL_EN BIT(0)
> +#define DISP_REG_AAL_SIZE  0x0030
>
> -#define DISP_DITHER_EN 0x
> +#define DISP_REG_DITHER_EN 0x

I think we should not change the register name just for alphabetic
order. We list the register in the order of its address.
If you have another reason to change register name, add another patch
to do this.

Regards,
Chun-Kuang.

>  #define DITHER_EN  BIT(0)
> -#define DISP_DITHER_CFG0x0020
> +#define DISP_REG_DITHER_CFG0x0020
>  #define DITHER_RELAY_MODE  BIT(0)
>  #define DITHER_ENGINE_EN   BIT(1)
> -#define DISP_DITHER_SIZE   0x0030
> -
> -#define LUT_10BIT_MASK 0x03ff
> -
> -#define OD_RELAYMODE   BIT(0)
> -
> -#define UFO_BYPASS BIT(2)
> -
> -#define AAL_EN BIT(0)
> -
>  #define DISP_DITHERING BIT(2)
> +#define DISP_REG_DITHER_SIZE   0x0030
> +#define DISP_REG_DITHER_5  0x0114
> +#define DISP_REG_DITHER_7  0x011c
> +#define DISP_REG_DITHER_15 0x013c
>  #define DITHER_LSB_ERR_SHIFT_R(x)  (((x) & 0x7) << 28)
> -#define DITHER_OVFLW_BIT_R(x)  (((x) & 0x7) << 24)
>  #define DITHER_ADD_LSHIFT_R(x) (((x) & 0x7) << 20)
> -#define DITHER_ADD_RSHIFT_R(x) (((x) & 0x7) << 16)
>  #define DITHER_NEW_BIT_MODEBIT(0)
> +#define DISP_REG_DITHER_16 0x0140
>  #define DITHER_LSB_ERR_SHIFT_B(x)  (((x) & 0x7) << 28)
> -#define DITHER_OVFLW_BIT_B(x)  (((x) & 0x7) << 24)
>  #define DITHER_ADD_LSHIFT_B(x) (((x) & 0x7) << 20)
> -#define DITHER_ADD_RSHIFT_B(x) (((x) & 0x7) << 16)
>  #define DITHER_LSB_ERR_SHIFT_G(x)  (((x) & 0x7) << 12)
> -#define DITHER_OVFLW_BIT_G(x)  (((x) & 0x7) << 8)
>  #define DITHER_ADD_LSHIFT_G(x) (((x) & 0x7) << 4)
> -#define DITHER_ADD_RSHIFT_G(x) (((x) & 0x7) << 0)
> +
> +#define DISP_REG_OD_EN 0x
> +#define DISP_REG_OD_CFG0x0020
> +#define OD_RELAYMODE   BIT(0)
> +#define DISP_REG_OD_SIZE   0x0030
> +
> +#define DISP_REG_UFO_START 0x
> +#define UFO_BYPASS BIT(2)
>
>  struct mtk_ddp_comp_dev {
> struct clk *clk;
> @@ -116,20 +102,6 @@ void mtk_ddp_write_mask(struct cmdq_pkt *cmdq_pkt, 
> unsigned int value,
>  #endif
>  }
>
> -static int mtk_ddp_clk_enable(struct device *dev)
> -{
> -   struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev);
> -
> -   return clk_prepare_enable(priv->clk);
> -}
> -
> -static void mtk_ddp_clk_disable(struct device *dev)
> -{
> -   struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev);
> -
> -   clk_disable_unprepare(priv->clk);
> -}
> -
>  void mtk_dither_set_common(void __iomem *regs, struct cmdq_client_reg 
> *cmdq_reg,
>unsigned int bpc, unsigned int cfg,
>

Re: [PATCH 14/46] drm/i915: Expose logical engine instance to user

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:11PM -0700, Matthew Brost wrote:
> Expose logical engine instance to user via query engine info IOCTL. This
> is required for split-frame workloads as these needs to be placed on
> engines in a logically contiguous order. The logical mapping can change
> based on fusing. Rather than having user have knowledge of the fusing we
> simply just expose the logical mapping with the existing query engine
> info IOCTL.
> 
> Cc: Tvrtko Ursulin 
> Signed-off-by: Matthew Brost 

Uapi must have a link to the userspace MR/patch set using this, and to the
igt patch set validating it.

Ideally in each patch, since it's way too hard to unfortunately find the
cover letter late on.

Jason even went as far as making this a hard requirement because he wasted
a bit too much time trying to find the userspace for new uapi:

https://lore.kernel.org/dri-devel/20210804185704.624883-1-ja...@jlekstrand.net/

Cheers, Daniel

>---
>  drivers/gpu/drm/i915/i915_query.c | 2 ++
>  include/uapi/drm/i915_drm.h   | 8 +++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index e49da36c62fb..8a72923fbdba 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
>   for_each_uabi_engine(engine, i915) {
>   info.engine.engine_class = engine->uabi_class;
>   info.engine.engine_instance = engine->uabi_instance;
> + info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
>   info.capabilities = engine->uabi_capabilities;
> + info.logical_instance = ilog2(engine->logical_mask);
>  
>   if (copy_to_user(info_ptr, &info, sizeof(info)))
>   return -EFAULT;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 7f13d241417f..ef72e07fe08c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2706,14 +2706,20 @@ struct drm_i915_engine_info {
>  
>   /** @flags: Engine flags. */
>   __u64 flags;
> +#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE(1 << 0)
>  
>   /** @capabilities: Capabilities of this engine. */
>   __u64 capabilities;
>  #define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0)
>  #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC  (1 << 1)
>  
> + /** @logical_instance: Logical instance of engine */
> + __u16 logical_instance;
> +
>   /** @rsvd1: Reserved fields. */
> - __u64 rsvd1[4];
> + __u16 rsvd1[3];
> + /** @rsvd2: Reserved fields. */
> + __u64 rsvd2[3];
>  };
>  
>  /**
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 13/46] drm/i915: Add logical engine mapping

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:10PM -0700, Matthew Brost wrote:
> Add logical engine mapping. This is required for split-frame, as
> workloads need to be placed on engines in a logically contiguous manner.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 60 ---
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
>  .../drm/i915/gt/intel_execlists_submission.c  |  1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  2 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 21 +--
>  5 files changed, 56 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 0d9105a31d84..4d790f9a65dd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -290,7 +290,8 @@ static void nop_irq_handler(struct intel_engine_cs 
> *engine, u16 iir)
>   GEM_DEBUG_WARN_ON(iir);
>  }
>  
> -static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> +static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id,
> +   u8 logical_instance)
>  {
>   const struct engine_info *info = &intel_engines[id];
>   struct drm_i915_private *i915 = gt->i915;
> @@ -334,6 +335,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> intel_engine_id id)
>  
>   engine->class = info->class;
>   engine->instance = info->instance;
> + engine->logical_mask = BIT(logical_instance);
>   __sprint_engine_name(engine);
>  
>   engine->props.heartbeat_interval_ms =
> @@ -572,6 +574,37 @@ static intel_engine_mask_t init_engine_mask(struct 
> intel_gt *gt)
>   return info->engine_mask;
>  }
>  
> +static void populate_logical_ids(struct intel_gt *gt, u8 *logical_ids,
> +  u8 class, const u8 *map, u8 num_instances)
> +{
> + int i, j;
> + u8 current_logical_id = 0;
> +
> + for (j = 0; j < num_instances; ++j) {
> + for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> + if (!HAS_ENGINE(gt, i) ||
> + intel_engines[i].class != class)
> + continue;
> +
> + if (intel_engines[i].instance == map[j]) {
> + logical_ids[intel_engines[i].instance] =
> + current_logical_id++;
> + break;
> + }
> + }
> + }
> +}
> +
> +static void setup_logical_ids(struct intel_gt *gt, u8 *logical_ids, u8 class)
> +{
> + int i;
> + u8 map[MAX_ENGINE_INSTANCE + 1];
> +
> + for (i = 0; i < MAX_ENGINE_INSTANCE + 1; ++i)
> + map[i] = i;
> + populate_logical_ids(gt, logical_ids, class, map, ARRAY_SIZE(map));
> +}
> +
>  /**
>   * intel_engines_init_mmio() - allocate and prepare the Engine Command 
> Streamers
>   * @gt: pointer to struct intel_gt
> @@ -583,7 +616,8 @@ int intel_engines_init_mmio(struct intel_gt *gt)
>   struct drm_i915_private *i915 = gt->i915;
>   const unsigned int engine_mask = init_engine_mask(gt);
>   unsigned int mask = 0;
> - unsigned int i;
> + unsigned int i, class;
> + u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
>   int err;
>  
>   drm_WARN_ON(&i915->drm, engine_mask == 0);
> @@ -593,15 +627,23 @@ int intel_engines_init_mmio(struct intel_gt *gt)
>   if (i915_inject_probe_failure(i915))
>   return -ENODEV;
>  
> - for (i = 0; i < ARRAY_SIZE(intel_engines); i++) {
> - if (!HAS_ENGINE(gt, i))
> - continue;
> + for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
> + setup_logical_ids(gt, logical_ids, class);
>  
> - err = intel_engine_setup(gt, i);
> - if (err)
> - goto cleanup;
> + for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
> + u8 instance = intel_engines[i].instance;
> +
> + if (intel_engines[i].class != class ||
> + !HAS_ENGINE(gt, i))
> + continue;
>  
> - mask |= BIT(i);
> + err = intel_engine_setup(gt, i,
> +  logical_ids[instance]);
> + if (err)
> + goto cleanup;
> +
> + mask |= BIT(i);
> + }
>   }
>  
>   /*
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index ed91bcff20eb..85e5c9a9e502 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -266,6 +266,7 @@ struct intel_engine_cs {
>   unsigned int guc_id;
>  
>   intel_engine_mask_t mask;
> + intel_engine_mask_t logical_mask;

Kerneldoc at least for new stuff. Bonus poi

Re: [Intel-gfx] [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:08PM -0700, Matthew Brost wrote:
> Calling switch_to_kernel_context isn't needed if the engine PM reference
> is taken while all contexts are pinned. By not calling
> switch_to_kernel_context we save on issuing a request to the engine.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 1f07ac4e0672..58099de6bf07 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -162,6 +162,10 @@ static bool switch_to_kernel_context(struct 
> intel_engine_cs *engine)
>   unsigned long flags;
>   bool result = true;
>  
> + /* No need to switch_to_kernel_context if GuC submission */

Maybe whack a big FIXME on here that we should unravel this properly.
Currently the execlist backend assumptions are leaked all over the place,
leading to stuff like this. Which means extremely fragile code.

I currently don't have a great idea on how exactly we should do that, but
oh well.

btw just in case we ever want to make guc lrc properly evictable (which as
the og use-case for this function, way, way back), would we need to fully
unregister them from guc? At least I'm assuming there's no other trick
like the below one.

Another aside: How does the perf/OA patching work on GuC?

Anyway, patch looks legit:

Reviewed-by: Daniel Vetter 

> + if (intel_engine_uses_guc(engine))
> + return true;
> +
>   /* GPU is pointing to the void, as good as in the kernel context. */
>   if (intel_gt_is_wedged(engine->gt))
>   return true;
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-08-09 Thread Daniel Vetter

On Tue, Aug 03, 2021 at 03:29:07PM -0700, Matthew Brost wrote:
> Taking a PM reference to prevent intel_gt_wait_for_idle from short
> circuiting while a scheduling of user context could be enabled.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/Makefile |  1 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
>  2 files changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 903de270f2db..5e3a1e2095b0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -103,6 +103,7 @@ gt-y += \
>   gt/intel_gt_clock_utils.o \
>   gt/intel_gt_irq.o \
>   gt/intel_gt_pm.o \
> + gt/intel_gt_pm_unpark_work.o \

This file isn't here?

Also pm stuff tends to have very nasty locking requirements, doing special
stuff like this in the backend tends to lead to really big surprises. I
think two options to make sure our locking design stays consistent:
- Lift this to generic code.
- expose some engine_pm_migt_get/put() calls which do have the right set
  of might_lock annoations, and call those in the generic code.

Imo the worst kernel abstractions are those where all implementations
look&act the same, except for locking. Unfortunately i915-gem code is full
of this stuff, and we need to stop this by enlisting lockdep to check the
contracts for us.
-Daniel

>   gt/intel_gt_pm_irq.o \
>   gt/intel_gt_requests.o \
>   gt/intel_gtt.o \
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 7fe4d1559a81..c5d9548bfd00 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2056,7 +2056,12 @@ static int guc_context_pre_pin(struct intel_context 
> *ce,
>  
>  static int guc_context_pin(struct intel_context *ce, void *vaddr)
>  {
> - return __guc_context_pin(ce, ce->engine, vaddr);
> + int ret = __guc_context_pin(ce, ce->engine, vaddr);
> +
> + if (likely(!ret && !intel_context_is_barrier(ce)))
> + intel_engine_pm_get(ce->engine);
> +
> + return ret;
>  }
>  
>  static void guc_context_unpin(struct intel_context *ce)
> @@ -2067,6 +2072,9 @@ static void guc_context_unpin(struct intel_context *ce)
>  
>   unpin_guc_id(guc, ce, true);
>   lrc_unpin(ce);
> +
> + if (likely(!intel_context_is_barrier(ce)))
> + intel_engine_pm_put(ce->engine);
>  }
>  
>  static void guc_context_post_unpin(struct intel_context *ce)
> @@ -3002,8 +3010,30 @@ static int guc_virtual_context_pre_pin(struct 
> intel_context *ce,
>  static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
>  {
>   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> + int ret = __guc_context_pin(ce, engine, vaddr);
> + intel_engine_mask_t tmp, mask = ce->engine->mask;
> +
> + if (likely(!ret))
> + for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> + intel_engine_pm_get(engine);
>  
> - return __guc_context_pin(ce, engine, vaddr);
> + return ret;
> +}
> +
> +static void guc_virtual_context_unpin(struct intel_context *ce)
> +{
> + intel_engine_mask_t tmp, mask = ce->engine->mask;
> + struct intel_engine_cs *engine;
> + struct intel_guc *guc = ce_to_guc(ce);
> +
> + GEM_BUG_ON(context_enabled(ce));
> + GEM_BUG_ON(intel_context_is_barrier(ce));
> +
> + unpin_guc_id(guc, ce, true);
> + lrc_unpin(ce);
> +
> + for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> + intel_engine_pm_put(engine);
>  }
>  
>  static void guc_virtual_context_enter(struct intel_context *ce)
> @@ -3040,7 +3070,7 @@ static const struct intel_context_ops 
> virtual_guc_context_ops = {
>  
>   .pre_pin = guc_virtual_context_pre_pin,
>   .pin = guc_virtual_context_pin,
> - .unpin = guc_context_unpin,
> + .unpin = guc_virtual_context_unpin,
>   .post_unpin = guc_context_post_unpin,
>  
>   .ban = guc_context_ban,
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH] drm/mediatek: Test component initialization earlier in the function mtk_drm_crtc_create

2021-08-09 Thread Chun-Kuang Hu

Hi, Dafna:

Dafna Hirschfeld  於 2021年7月13日 週二 上午2:12寫道：
>
> The initialization is currently tested in a later stage in
> the function for no reason.
> In addition, the test '!comp' will never fail since comp is
> set with the '&' operator. Instead, test if a comp was not
> initialized by testing "!comp->dev".

Applied to mediatek-drm-next [1], thanks.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux.git/log/?h=mediatek-drm-next

Regards,
Chun-Kuang.

>
> Signed-off-by: Dafna Hirschfeld 
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> index 474efb844249..06f40e589922 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> @@ -755,14 +755,22 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> for (i = 0; i < path_len; i++) {
> enum mtk_ddp_comp_id comp_id = path[i];
> struct device_node *node;
> +   struct mtk_ddp_comp *comp;
>
> node = priv->comp_node[comp_id];
> +   comp = &priv->ddp_comp[comp_id];
> +
> if (!node) {
> dev_info(dev,
>  "Not creating crtc %d because component %d 
> is disabled or missing\n",
>  pipe, comp_id);
> return 0;
> }
> +
> +   if (!comp->dev) {
> +   dev_err(dev, "Component %pOF not initialized\n", 
> node);
> +   return -ENODEV;
> +   }
> }
>
> mtk_crtc = devm_kzalloc(dev, sizeof(*mtk_crtc), GFP_KERNEL);
> @@ -787,16 +795,8 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> for (i = 0; i < mtk_crtc->ddp_comp_nr; i++) {
> enum mtk_ddp_comp_id comp_id = path[i];
> struct mtk_ddp_comp *comp;
> -   struct device_node *node;
>
> -   node = priv->comp_node[comp_id];
> comp = &priv->ddp_comp[comp_id];
> -   if (!comp) {
> -   dev_err(dev, "Component %pOF not initialized\n", 
> node);
> -   ret = -ENODEV;
> -   return ret;
> -   }
> -
> mtk_crtc->ddp_comp[i] = comp;
>
> if (comp->funcs) {
> --
> 2.17.1
>

Re: [Intel-gfx] [PATCH] fbdev/efifb: Release PCI device's runtime PM ref during FB destroy

2021-08-09 Thread Daniel Vetter

On Sat, Aug 07, 2021 at 06:21:10PM +0300, Imre Deak wrote:
> On Thu, Aug 05, 2021 at 12:23:21AM +0200, Daniel Vetter wrote:
> > On Mon, Aug 02, 2021 at 04:35:51PM +0300, Imre Deak wrote:
> > > Atm the EFI FB driver gets a runtime PM reference for the associated GFX
> > > PCI device during driver probing and releases it only when removing the
> > > driver.
> > > 
> > > When fbcon switches to the FB provided by the PCI device's driver (for
> > > instance i915/drmfb), the EFI FB will get only unregistered without the
> > > EFI FB driver getting unloaded, keeping the runtime PM reference
> > > acquired during driver probing. This reference will prevent the PCI
> > > driver from runtime suspending the device.
> > > 
> > > Fix this by releasing the RPM reference from the EFI FB's destroy hook,
> > > called when the FB gets unregistered.
> > > 
> > > Fixes: a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at 
> > > PCI D0")
> > > Cc: Kai-Heng Feng 
> > > Signed-off-by: Imre Deak 
> > 
> > Patch looks good:
> > 
> > Reviewed-by: Daniel Vetter 
> > 
> > But I've found a bunch of ordering issues here:
> > - we should probably get the runtime pm reference _before_ we register the
> >   framebuffer. There's a race right now about there.
> 
> Yea, missed this will send a v2 moving it earlier.
> 
> > - the sysfs_remove_groups and framebuffer_release should also be moved
> >   into the destroy callback. This is more a leak type of situation.
> 
> Those sysfs entries belong to the efifb platform device, showing the
> bootup screen_info.lfb_* info, not related to fb_info, so imo
> efifb_remove() is the correct place to remove those. But yes, freeing
> fb_info seems to belong to fb_destroy().

Ah ok. Might be good to put a comment down that this isn't tied to fb_info
lifetime.

> Atm, things will blow up when unbinding the efifb device after the efifb
> framebuffer was unregistered while removing it as a conflicting FB
> (since unregister_framebuffer() will be called twice), so that would
> need to be solved as well. Maybe remove_conflicting_pci_framebuffers()
> could unregister the platform device instead of only unregistering the
> framebuffer, similarly to drm_aperture_detach_firmware(), but haven't
> checked this in more detail.

Yeah either that, or a double-unregister check (plus correct refcount) in
unregister_framebuffer. Ideally with a check so that only the
double-unregstier from remove_conflicting_pci_framebuffers is caught, and
not a driver that accidentally unregisters the fbdev twice.

Even better if this would be all devm_ wrapped so it's idiot proof.

I think generally I'd say "let's not invest in fbdev", but a) these
hotremove/unload bugs have been hurting us since forever, and b) efifb
seems to be bound to stay around for a very long time - the simpldrmfb
stuff isn't really moving forward very fast.

Anyway, would be good to get this all sorted eventually.
-Daniel

> 
> > Cheers, Daniel
> > 
> > > ---
> > >  drivers/video/fbdev/efifb.c | 8 +---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> > > index 8ea8f079cde26..25cdea32b9633 100644
> > > --- a/drivers/video/fbdev/efifb.c
> > > +++ b/drivers/video/fbdev/efifb.c
> > > @@ -47,6 +47,8 @@ static bool use_bgrt = true;
> > >  static bool request_mem_succeeded = false;
> > >  static u64 mem_flags = EFI_MEMORY_WC | EFI_MEMORY_UC;
> > >  
> > > +static struct pci_dev *efifb_pci_dev;/* dev with BAR covering the 
> > > efifb */
> > > +
> > >  static struct fb_var_screeninfo efifb_defined = {
> > >   .activate   = FB_ACTIVATE_NOW,
> > >   .height = -1,
> > > @@ -243,6 +245,9 @@ static inline void efifb_show_boot_graphics(struct 
> > > fb_info *info) {}
> > >  
> > >  static void efifb_destroy(struct fb_info *info)
> > >  {
> > > + if (efifb_pci_dev)
> > > + pm_runtime_put(&efifb_pci_dev->dev);
> > > +
> > >   if (info->screen_base) {
> > >   if (mem_flags & (EFI_MEMORY_UC | EFI_MEMORY_WC))
> > >   iounmap(info->screen_base);
> > > @@ -333,7 +338,6 @@ ATTRIBUTE_GROUPS(efifb);
> > >  
> > >  static bool pci_dev_disabled;/* FB base matches BAR of a disabled 
> > > device */
> > >  
> > > -static struct pci_dev *efifb_pci_dev;/* dev with BAR covering the 
> > > efifb */
> > >  static struct resource *bar_resource;
> > >  static u64 bar_offset;
> > >  
> > > @@ -603,8 +607,6 @@ static int efifb_remove(struct platform_device *pdev)
> > >   unregister_framebuffer(info);
> > >   sysfs_remove_groups(&pdev->dev.kobj, efifb_groups);
> > >   framebuffer_release(info);
> > > - if (efifb_pci_dev)
> > > - pm_runtime_put(&efifb_pci_dev->dev);
> > >  
> > >   return 0;
> > >  }
> > > -- 
> > > 2.27.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

2021-08-09 Thread Daniel Vetter

On Fri, Aug 06, 2021 at 07:27:13AM +, Kasireddy, Vivek wrote:
> Hi Daniel,
> 
> > > > > >>> The solution:
> > > > > >>> - To ensure full framerate, the Guest compositor has to start 
> > > > > >>> it's repaint cycle
> > > > (including
> > > > > >>> the 9 ms wait) when the Host compositor sends the frame callback 
> > > > > >>> event to its
> > > > clients.
> > > > > >>> In order for this to happen, the dma-fence that the Guest KMS 
> > > > > >>> waits on -- before
> > > > sending
> > > > > >>> pageflip completion -- cannot be tied to a wl_buffer.release 
> > > > > >>> event. This means
> > that,
> > > > the
> > > > > >>> Guest compositor has to be forced to use a new buffer for its 
> > > > > >>> next repaint cycle
> > > > when it
> > > > > >>> gets a pageflip completion.
> > > > > >>
> > > > > >> Is that really the only solution?
> > > > > > [Kasireddy, Vivek] There are a few others I mentioned here:
> > > > > > https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572
> > > > > > But I think none of them are as compelling as this one.
> > > > > >
> > > > > >>
> > > > > >> If we fix the event timestamps so that both guest and host use the 
> > > > > >> same
> > > > > >> timestamp, but then the guest starts 5ms (or something like that) 
> > > > > >> earlier,
> > > > > >> then things should work too? I.e.
> > > > > >> - host compositor starts at (previous_frametime + 9ms)
> > > > > >> - guest compositor starts at (previous_frametime + 4ms)
> > > > > >>
> > > > > >> Ofc this only works if the frametimes we hand out to both match 
> > > > > >> _exactly_
> > > > > >> and are as high-precision as the ones on the host side. Which for 
> > > > > >> many gpu
> > > > > >> drivers at least is the case, and all the ones you care about for 
> > > > > >> sure :-)
> > > > > >>
> > > > > >> But if the frametimes the guest receives are the no_vblank fake 
> > > > > >> ones, then
> > > > > >> they'll be all over the place and this carefully tuned low-latency 
> > > > > >> redraw
> > > > > >> loop falls apart. Aside fromm the fact that without tuning the 
> > > > > >> guests to
> > > > > >> be earlier than the hosts, you're guaranteed to miss every frame 
> > > > > >> (except
> > > > > >> when the timing wobbliness in the guest is big enough by chance to 
> > > > > >> make
> > > > > >> the deadline on the oddball frame).
> > > > > > [Kasireddy, Vivek] The Guest and Host use different event 
> > > > > > timestamps as we don't
> > > > > > share these between the Guest and the Host. It does not seem to be 
> > > > > > causing any
> > other
> > > > > > problems so far but we did try the experiment you mentioned (i.e., 
> > > > > > adjusting the
> > > > delays)
> > > > > > and it works. However, this patch series is meant to fix the issue 
> > > > > > without having to
> > > > tweak
> > > > > > anything (delays) because we can't do this for every compositor out 
> > > > > > there.
> > > > >
> > > > > Maybe there could be a mechanism which allows the compositor in the 
> > > > > guest to
> > > > automatically adjust its repaint cycle as needed.
> > > > >
> > > > > This might even be possible without requiring changes in each 
> > > > > compositor, by
> > adjusting
> > > > the vertical blank periods in the guest to be aligned with the host 
> > > > compositor repaint
> > > > cycles. Not sure about that though.
> > > > >
> > > > > Even if not, both this series or making it possible to queue multiple 
> > > > > flips require
> > > > corresponding changes in each compositor as well to have any effect.
> > > >
> > > > Yeah from all the discussions and tests done it sounds even with a
> > > > deeper queue we have big coordination issues between the guest and
> > > > host compositor (like the example that the guest is now rendering at
> > > > 90fps instead of 60fps like the host).
> > > [Kasireddy, Vivek] Oh, I think you are referring to my reply to Gerd. 
> > > That 90 FPS vs
> > > 60 FPS problem is a completely different issue that is associated with 
> > > Qemu GTK UI
> > > backend. With the GTK backend -- and also with SDL backend -- we Blit the 
> > > Guest
> > > scanout FB onto one of the backbuffers managed by EGL.
> > >
> > > I am trying to add a new Qemu Wayland UI backend so that we can eliminate 
> > > that Blit
> > > and thereby have a truly zero-copy solution. And, this is there I am 
> > > running into the
> > > halved frame-rate issue -- the current problem.
> > 
> > Yes, that's what I referenced. But I disagree that it's a different
> > problem. The underlying problem in both cases is that the guest and host
> > compositor free-wheel instead of rendering in sync. It's just that
> > depending upon how exactly the flip completion event on the gues side
> > plays out you either get guest rendering that's faster than the host-side
> > 60fps, or guest rendering that's much slower than the host-side 60fps.
> [Kasireddy, Vivek] That used to be the case before we added a synchronization
> mechanism to the GTK UI

Re: [PATCH v2 2/5] dt-bindings: display: mediatek: dsi: add documentation for MT8167 SoC

2021-08-09 Thread Chun-Kuang Hu

Hi, Fabien:

Fabien Parent  於 2020年10月23日 週五 下午9:31寫道：
>
> Add binding documentation for the MT8167 SoC.

Applied to mediatek-drm-next [1], thanks.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux.git/log/?h=mediatek-drm-next

Regards,
Chun-Kuang.

>
> Signed-off-by: Fabien Parent 
> ---
>
> Changelog:
>
> V2: removed part that added a new clock
>
>  .../devicetree/bindings/display/mediatek/mediatek,dsi.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
> index f06f24d405a5..6a10de812158 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt
> @@ -7,7 +7,7 @@ channel output.
>
>  Required properties:
>  - compatible: "mediatek,-dsi"
> -- the supported chips are mt2701, mt7623, mt8173 and mt8183.
> +- the supported chips are mt2701, mt7623, mt8167, mt8173 and mt8183.
>  - reg: Physical base address and length of the controller's registers
>  - interrupts: The interrupt signal from the function block.
>  - clocks: device clocks
> @@ -26,7 +26,7 @@ The MIPI TX configuration module controls the MIPI D-PHY.
>
>  Required properties:
>  - compatible: "mediatek,-mipi-tx"
> -- the supported chips are mt2701, 7623, mt8173 and mt8183.
> +- the supported chips are mt2701, 7623, mt8167, mt8173 and mt8183.
>  - reg: Physical base address and length of the controller's registers
>  - clocks: PLL reference clock
>  - clock-output-names: name of the output clock line to the DSI encoder
> --
> 2.28.0
>

Re: linux-next: Signed-off-by missing for commit in the drm-intel tree

2021-08-09 Thread Daniel Vetter

On Fri, Aug 06, 2021 at 09:36:56AM +0300, Joonas Lahtinen wrote:
> Hi Matt,
> 
> Always use the dim tooling when applying patches, it will do the right
> thing with regards to adding the S-o-b.

fd.o server rejects any pushes that haven't been done by dim, so how did
this get through? Matt, can you pls figure out and type up the patch to
plug that hole?

Thanks, Daniel

> 
> Regards, Joonas
> 
> Quoting Stephen Rothwell (2021-07-15 07:18:54)
> > Hi all,
> > 
> > Commit
> > 
> >   db47fe727e1f ("drm/i915/step: s/_revid_tbl/_revids")
> > 
> > is missing a Signed-off-by from its committer.
> > 
> > -- 
> > Cheers,
> > Stephen Rothwell

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v2 5/5] drm/mediatek: Add support for main DDP path on MT8167

2021-08-09 Thread Chun-Kuang Hu

Hi, Fabien:

Fabien Parent  於 2020年10月23日 週五 下午9:31寫道：
>
> Add the main (DSI) drm display path for MT8167.
>

Applied to mediatek-drm-next [1], thanks.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux.git/log/?h=mediatek-drm-next

Regards,
Chun-Kuang.

> Signed-off-by: Fabien Parent 
> ---
>
> Changelog:
>
> V2: No change
>
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c | 38 ++
>  1 file changed, 38 insertions(+)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> index 59c85c63b7cc..3952435093fe 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
> @@ -112,6 +112,17 @@ static const enum mtk_ddp_comp_id mt2712_mtk_ddp_third[] 
> = {
> DDP_COMPONENT_PWM2,
>  };
>
> +static enum mtk_ddp_comp_id mt8167_mtk_ddp_main[] = {
> +   DDP_COMPONENT_OVL0,
> +   DDP_COMPONENT_COLOR0,
> +   DDP_COMPONENT_CCORR,
> +   DDP_COMPONENT_AAL0,
> +   DDP_COMPONENT_GAMMA,
> +   DDP_COMPONENT_DITHER,
> +   DDP_COMPONENT_RDMA0,
> +   DDP_COMPONENT_DSI0,
> +};
> +
>  static const enum mtk_ddp_comp_id mt8173_mtk_ddp_main[] = {
> DDP_COMPONENT_OVL0,
> DDP_COMPONENT_COLOR0,
> @@ -163,6 +174,11 @@ static const struct mtk_mmsys_driver_data 
> mt8173_mmsys_driver_data = {
> .ext_len = ARRAY_SIZE(mt8173_mtk_ddp_ext),
>  };
>
> +static const struct mtk_mmsys_driver_data mt8167_mmsys_driver_data = {
> +   .main_path = mt8167_mtk_ddp_main,
> +   .main_len = ARRAY_SIZE(mt8167_mtk_ddp_main),
> +};
> +
>  static int mtk_drm_kms_init(struct drm_device *drm)
>  {
> struct mtk_drm_private *private = drm->dev_private;
> @@ -401,26 +417,42 @@ static const struct component_master_ops mtk_drm_ops = {
>  static const struct of_device_id mtk_ddp_comp_dt_ids[] = {
> { .compatible = "mediatek,mt2701-disp-ovl",
>   .data = (void *)MTK_DISP_OVL },
> +   { .compatible = "mediatek,mt8167-disp-ovl",
> + .data = (void *)MTK_DISP_OVL },
> { .compatible = "mediatek,mt8173-disp-ovl",
>   .data = (void *)MTK_DISP_OVL },
> { .compatible = "mediatek,mt2701-disp-rdma",
>   .data = (void *)MTK_DISP_RDMA },
> +   { .compatible = "mediatek,mt8167-disp-rdma",
> + .data = (void *)MTK_DISP_RDMA },
> { .compatible = "mediatek,mt8173-disp-rdma",
>   .data = (void *)MTK_DISP_RDMA },
> { .compatible = "mediatek,mt8173-disp-wdma",
>   .data = (void *)MTK_DISP_WDMA },
> +   { .compatible = "mediatek,mt8167-disp-ccorr",
> + .data = (void *)MTK_DISP_CCORR },
> { .compatible = "mediatek,mt2701-disp-color",
>   .data = (void *)MTK_DISP_COLOR },
> +   { .compatible = "mediatek,mt8167-disp-color",
> + .data = (void *)MTK_DISP_COLOR },
> { .compatible = "mediatek,mt8173-disp-color",
>   .data = (void *)MTK_DISP_COLOR },
> +   { .compatible = "mediatek,mt8167-disp-aal",
> + .data = (void *)MTK_DISP_AAL},
> { .compatible = "mediatek,mt8173-disp-aal",
>   .data = (void *)MTK_DISP_AAL},
> +   { .compatible = "mediatek,mt8167-disp-gamma",
> + .data = (void *)MTK_DISP_GAMMA, },
> { .compatible = "mediatek,mt8173-disp-gamma",
>   .data = (void *)MTK_DISP_GAMMA, },
> +   { .compatible = "mediatek,mt8167-disp-dither",
> + .data = (void *)MTK_DISP_DITHER },
> { .compatible = "mediatek,mt8173-disp-ufoe",
>   .data = (void *)MTK_DISP_UFOE },
> { .compatible = "mediatek,mt2701-dsi",
>   .data = (void *)MTK_DSI },
> +   { .compatible = "mediatek,mt8167-dsi",
> + .data = (void *)MTK_DSI },
> { .compatible = "mediatek,mt8173-dsi",
>   .data = (void *)MTK_DSI },
> { .compatible = "mediatek,mt2701-dpi",
> @@ -431,10 +463,14 @@ static const struct of_device_id mtk_ddp_comp_dt_ids[] 
> = {
>   .data = (void *)MTK_DISP_MUTEX },
> { .compatible = "mediatek,mt2712-disp-mutex",
>   .data = (void *)MTK_DISP_MUTEX },
> +   { .compatible = "mediatek,mt8167-disp-mutex",
> + .data = (void *)MTK_DISP_MUTEX },
> { .compatible = "mediatek,mt8173-disp-mutex",
>   .data = (void *)MTK_DISP_MUTEX },
> { .compatible = "mediatek,mt2701-disp-pwm",
>   .data = (void *)MTK_DISP_BLS },
> +   { .compatible = "mediatek,mt8167-disp-pwm",
> + .data = (void *)MTK_DISP_PWM },
> { .compatible = "mediatek,mt8173-disp-pwm",
>   .data = (void *)MTK_DISP_PWM },
> { .compatible = "mediatek,mt8173-disp-od",
> @@ -449,6 +485,8 @@ static const struct of_device_id mtk_drm_of_ids[] = {
>   .data = &mt7623_mmsys_driver_data},
> { .compatible = "mediatek,mt2712-mmsys",
>   .data = &mt2712_mmsys_driver_data},
> +   { .compatible = "mediatek,mt8167-mmsys",
> + .data = &mt8167_mmsys_driver_data},
>

Re: [Intel-gfx] [PATCH 3/3] drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H

2021-08-09 Thread Daniel Vetter

On Sun, Aug 08, 2021 at 11:07:57AM -0700, Matthew Brost wrote:
> While debugging an issue with full GT resets I went down a rabbit hole
> thinking the scrubbing of lost G2H wasn't working correctly. This proved
> to be incorrect as this was working just fine but this chase inspired me
> to write a selftest to prove that this works. This simple selftest
> injects errors dropping various G2H and then issues a full GT reset
> proving that the scrubbing of these G2H doesn't blow up.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  18 +++
>  drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 126 ++
>  .../drm/i915/selftests/i915_live_selftests.h  |   1 +
>  .../i915/selftests/intel_scheduler_helpers.c  |  12 ++
>  .../i915/selftests/intel_scheduler_helpers.h  |   2 +
>  6 files changed, 163 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index e54351a170e2..fec5ff7ef168 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -198,6 +198,10 @@ struct intel_context {
>*/
>   u8 guc_prio;
>   u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> +

I know the existing stuff isn't following this at all, but for anything
new we really should put some kerneldoc into structures. This probably
means you need to open-code the #ifdef here, since this macro will likely
upset kerneldoc parsing.

> + I915_SELFTEST_DECLARE(bool drop_schedule_enable);
> + I915_SELFTEST_DECLARE(bool drop_schedule_disable);
> + I915_SELFTEST_DECLARE(bool drop_deregister);
>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index cd8df078ca87..d13dc56bae43 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2618,6 +2618,11 @@ int intel_guc_deregister_done_process_msg(struct 
> intel_guc *guc,
>  
>   trace_intel_context_deregister_done(ce);
>  
> + if (I915_SELFTEST_ONLY(ce->drop_deregister)) {
> + I915_SELFTEST_DECLARE(ce->drop_deregister = false;)

This macro wrapping is quite nasty, can't we just #ifdef this? Especially
the _DECLARE name really doesn't expect a statement.

Aside from these bikesheds I don't have a much to say on the test logic
itself, since I'm far from knowledgable on guc stuff ...
-Daniel


> + return 0;
> + }
> +
>   if (context_wait_for_deregister_to_register(ce)) {
>   struct intel_runtime_pm *runtime_pm =
>   &ce->engine->gt->i915->runtime_pm;
> @@ -2672,10 +2677,19 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
> *guc,
>   trace_intel_context_sched_done(ce);
>  
>   if (context_pending_enable(ce)) {
> + if (I915_SELFTEST_ONLY(ce->drop_schedule_enable)) {
> + I915_SELFTEST_DECLARE(ce->drop_schedule_enable = false;)
> + return 0;
> + }
>   clr_context_pending_enable(ce);
>   } else if (context_pending_disable(ce)) {
>   bool banned;
>  
> + if (I915_SELFTEST_ONLY(ce->drop_schedule_disable)) {
> + I915_SELFTEST_DECLARE(ce->drop_schedule_disable = 
> false;)
> + return 0;
> + }
> +
>   /*
>* Unpin must be done before __guc_signal_context_fence,
>* otherwise a race exists between the requests getting
> @@ -3047,3 +3061,7 @@ bool intel_guc_virtual_engine_has_heartbeat(const 
> struct intel_engine_cs *ve)
>  
>   return false;
>  }
> +
> +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> +#include "selftest_guc.c"
> +#endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc.c 
> b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> new file mode 100644
> index ..46ca6554f65d
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc.c
> @@ -0,0 +1,126 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright �� 2021 Intel Corporation
> + */
> +
> +#include "selftests/intel_scheduler_helpers.h"
> +
> +static struct i915_request *nop_user_request(struct intel_context *ce,
> +  struct i915_request *from)
> +{
> + struct i915_request *rq;
> + int ret;
> +
> + rq = intel_context_create_request(ce);
> + if (IS_ERR(rq))
> + return rq;
> +
> + if (from) {
> + ret = i915_sw_fence_await_dma_fence(&rq->submit,
> + &from->fence, 0,
> + I915_FENCE_GFP);
> + if (ret < 0) {
> + i915_request_put(rq);
> +

1 2 >

1 - 100 of 117 matches

Mail list logo