On Mon, May 15, 2017 at 04:47:16PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 15/05/17 10:41, Michal Wajdeczko wrote:
> > On Mon, May 15, 2017 at 09:35:30AM -0700, Daniele Ceraolo Spurio wrote:
> > > We're currently deleting the GuC logs if the FW fails to load, but those
> > > are still useful to understand why the loading failed. Keeping the
> > > object around allows us to access them after driver load is completed.
> > > 
> > > v2: keep the object around instead of using kernel memory (chris)
> > >     don't store the logs in the gpu_error struct (Chris)
> > >     add a check on guc_log_level to avoid snapshotting empty logs
> > > 
> > > v3: use separate debugfs for error log (Chris)
> > > 
> > > v4: rebased
> > > 
> > > Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk> (v3)
> > > Cc: Chris Wilson <ch...@chris-wilson.co.uk>
> > > Cc: Oscar Mateo <oscar.ma...@intel.com>
> > > Cc: Michal Wajdeczko <michal.wajdec...@intel.com>
> > > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospu...@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_debugfs.c  | 35 
> > > ++++++++++++++++++++++-------------
> > >  drivers/gpu/drm/i915/i915_drv.c      |  3 +++
> > >  drivers/gpu/drm/i915/intel_guc_log.c | 17 +++++++++++++++++
> > >  drivers/gpu/drm/i915/intel_uc.c      |  7 +++++--
> > >  drivers/gpu/drm/i915/intel_uc.h      |  5 +++++
> > >  5 files changed, 52 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> > > b/drivers/gpu/drm/i915/i915_debugfs.c
> > > index bd9abef..d9a5bd8 100644
> > > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > > @@ -2607,27 +2607,35 @@ static int i915_guc_stage_pool(struct seq_file 
> > > *m, void *data)
> > > 
> > >  static int i915_guc_log_dump(struct seq_file *m, void *data)
> > >  {
> > > - struct drm_i915_private *dev_priv = node_to_i915(m->private);
> > > + struct drm_info_node *node = m->private;
> > > + struct drm_i915_private *dev_priv = node_to_i915(node);
> > > + bool dump_err_log = !!node->info_ent->data;
> > >   struct drm_i915_gem_object *obj;
> > > - int i = 0, pg;
> > > + u32 *log;
> > > + int i = 0;
> > > 
> > > - if (!dev_priv->guc.log.vma)
> > > + if (!dump_err_log && dev_priv->guc.log.vma)
> > > +         obj = dev_priv->guc.log.vma->obj;
> > > + else if (dump_err_log && dev_priv->guc.err_load_log)
> > > +         obj = dev_priv->guc.err_load_log;
> > > + else
> > >           return 0;
> > 
> > Maybe we can move actual printfs into separate generic function that
> > will just take desired obj? It will simplify this function and possibly
> > allow more reuse of that new funcion in the future.
> > 
> 
> The problem here would be that each debugfs function has its own desired
> formatting and might only be interested in a section of an object (e.g.
> i915_dump_lrc_obj), not sure how much we can reuse. I'd prefer to wait for
> an actual use case before splitting it out.
> 
> > We can even try define two separate small functions with names that
> > corresponds to the debugfs entry, to follow all other functions
> > 
> 
> We already have a case where 2 debugfs entries are pointing to the same
> function (i915_gem_gtt and i915_gem_pin_display) and it feels cleaner to me
> compared to having 2 one-liner function just to pick between the different
> objects. If you're ok with it I'd prefer to instead clean up the if
> statement the following way to make the obj selection more evident:
> 
>       if (dump_err_log)
>               obj = dev_priv->guc.log.err_load;
>       else
>               obj = dev_priv->guc.log.vma->obj;
> 
>       if (!obj)
>               return 0;
> 

Ok, we start with this.

> > > 
> > > - obj = dev_priv->guc.log.vma->obj;
> > > - for (pg = 0; pg < obj->base.size / PAGE_SIZE; pg++) {
> > > -         u32 *log = kmap_atomic(i915_gem_object_get_page(obj, pg));
> > > -
> > > -         for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
> > > -                 seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> > > -                            *(log + i), *(log + i + 1),
> > > -                            *(log + i + 2), *(log + i + 3));
> > > -
> > > -         kunmap_atomic(log);
> > > + log = i915_gem_object_pin_map(obj, I915_MAP_WC);
> > > + if (IS_ERR(log)) {
> > > +         DRM_ERROR("Failed to pin guc_log object\n");
> > > +         return PTR_ERR(log);
> > >   }
> > > 
> > > + for (i = 0; i < obj->base.size / sizeof(u32); i += 4)
> > > +         seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> > > +                    *(log + i), *(log + i + 1),
> > > +                    *(log + i + 2), *(log + i + 3));
> > > +
> > >   seq_putc(m, '\n');
> > > 
> > > + i915_gem_object_unpin_map(obj);
> > > +
> > >   return 0;
> > >  }
> > > 
> > > @@ -4811,6 +4819,7 @@ static int i915_hpd_storm_ctl_open(struct inode 
> > > *inode, struct file *file)
> > >   {"i915_guc_info", i915_guc_info, 0},
> > >   {"i915_guc_load_status", i915_guc_load_status_info, 0},
> > >   {"i915_guc_log_dump", i915_guc_log_dump, 0},
> > > + {"i915_guc_err_load_log_dump", i915_guc_log_dump, 0, (void *)1},
> > >   {"i915_guc_stage_pool", i915_guc_stage_pool, 0},
> > >   {"i915_huc_load_status", i915_huc_load_status_info, 0},
> > >   {"i915_frequency_info", i915_frequency_info, 0},
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > > b/drivers/gpu/drm/i915/i915_drv.c
> > > index 72fb47a..d309165 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1363,6 +1363,9 @@ void i915_driver_unload(struct drm_device *dev)
> > >   cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> > >   i915_reset_error_state(dev_priv);
> > > 
> > > + /* release GuC error log (if any) */
> > > + i915_guc_load_error_log_free(&dev_priv->guc);
> > 
> > Maybe better place for this call would be in intel_uc_fini_hw() that is
> > already called from i915_gem_fini().
> > 
> 
> Agreed, this positioning is a remnant of v1. Will move.
> 
> > > +
> > >   /* Flush any outstanding unpin_work. */
> > >   drain_workqueue(dev_priv->wq);
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_guc_log.c 
> > > b/drivers/gpu/drm/i915/intel_guc_log.c
> > > index 16d3b87..691da42 100644
> > > --- a/drivers/gpu/drm/i915/intel_guc_log.c
> > > +++ b/drivers/gpu/drm/i915/intel_guc_log.c
> > > @@ -660,3 +660,20 @@ void i915_guc_log_unregister(struct drm_i915_private 
> > > *dev_priv)
> > >   guc_log_runtime_destroy(&dev_priv->guc);
> > >   mutex_unlock(&dev_priv->drm.struct_mutex);
> > >  }
> > > +
> > > +void i915_guc_load_error_log_capture(struct intel_guc *guc)
> > > +{
> > > + if (!guc->log.vma || i915.guc_log_level < 0)
> > > +         return;
> > > +
> > > + if (!guc->err_load_log)
> > > +         guc->err_load_log = i915_gem_object_get(guc->log.vma->obj);
> > > +
> > > + return;
> > > +}
> > > +
> > > +void i915_guc_load_error_log_free(struct intel_guc *guc)
> > > +{
> > > + if (guc->err_load_log)
> > > +         i915_gem_object_put(guc->err_load_log);
> > > +}
> > > diff --git a/drivers/gpu/drm/i915/intel_uc.c 
> > > b/drivers/gpu/drm/i915/intel_uc.c
> > > index 07c5658..3669b57 100644
> > > --- a/drivers/gpu/drm/i915/intel_uc.c
> > > +++ b/drivers/gpu/drm/i915/intel_uc.c
> > > @@ -309,6 +309,7 @@ int intel_uc_init_hw(struct drm_i915_private 
> > > *dev_priv)
> > > 
> > >   guc_disable_communication(guc);
> > >   gen9_reset_guc_interrupts(dev_priv);
> > > + i915_guc_load_error_log_free(guc);
> > > 
> > >   /* We need to notify the guc whenever we change the GGTT */
> > >   i915_ggtt_enable_guc(dev_priv);
> > > @@ -355,11 +356,11 @@ int intel_uc_init_hw(struct drm_i915_private 
> > > *dev_priv)
> > > 
> > >   /* Did we succeded or run out of retries? */
> > >   if (ret)
> > > -         goto err_submission;
> > > +         goto err_log_capture;
> > > 
> > >   ret = guc_enable_communication(guc);
> > >   if (ret)
> > > -         goto err_submission;
> > > +         goto err_log_capture;
> > > 
> > >   intel_guc_auth_huc(dev_priv);
> > >   if (i915.enable_guc_submission) {
> > > @@ -385,6 +386,8 @@ int intel_uc_init_hw(struct drm_i915_private 
> > > *dev_priv)
> > >  err_interrupts:
> > >   guc_disable_communication(guc);
> > >   gen9_disable_guc_interrupts(dev_priv);
> > > +err_log_capture:
> > > + i915_guc_load_error_log_capture(guc);
> > >  err_submission:
> > >   if (i915.enable_guc_submission)
> > >           i915_guc_submission_fini(dev_priv);
> > > diff --git a/drivers/gpu/drm/i915/intel_uc.h 
> > > b/drivers/gpu/drm/i915/intel_uc.h
> > > index 7618b71..d2ed030 100644
> > > --- a/drivers/gpu/drm/i915/intel_uc.h
> > > +++ b/drivers/gpu/drm/i915/intel_uc.h
> > > @@ -220,6 +220,9 @@ struct intel_guc {
> > > 
> > >   /* GuC's FW specific notify function */
> > >   void (*notify)(struct intel_guc *guc);
> > > +
> > > + /* Log snapshot if GuC errors during load */
> > > + struct drm_i915_gem_object *err_load_log;
> > 
> > Hmm, as you put new functions in guc_log.c file, then maybe this member
> > can be placed in the struct guc_log (as "snapshot") ?
> > 
> > or, if we want to keep "log" struct untouched, then at least move this
> > member up, closer to the existing fw and log members (as it is related)
> > and also consider to move capture code to intel_uc.c as static functions
> > 
> 
> I agree it makes more sense to move it inside the log struct to keep
> everything log-related in one place, I'll move it to "guc.log.err_load". For
> the same reason I'd prefer to keep the functions in intel_guc_log.c.
> 

But then, maybe they shall be named like this:

        void intel_guc_log_capture_err(struct intel_guc *guc);
        void intel_guc_log_free_err(struct intel_guc *guc);


Thanks,
Michal

> > >  };
> > > 
> > >  struct intel_huc {
> > > @@ -272,6 +275,8 @@ static inline void intel_guc_notify(struct intel_guc 
> > > *guc)
> > >  int i915_guc_log_control(struct drm_i915_private *dev_priv, u64 
> > > control_val);
> > >  void i915_guc_log_register(struct drm_i915_private *dev_priv);
> > >  void i915_guc_log_unregister(struct drm_i915_private *dev_priv);
> > > +void i915_guc_load_error_log_capture(struct intel_guc *guc);
> > > +void i915_guc_load_error_log_free(struct intel_guc *guc);
> > 
> > bikeshed: we are trying to always use "intel_guc" prefix for functions that
> > take "guc" as param. Prefix "i915_" is for functions that take "dev_priv".
> > 
> 
> Noted, will update.
> 
> Thanks,
> Daniele
> 
> > -Michal
> > 
> > > 
> > >  static inline u32 guc_ggtt_offset(struct i915_vma *vma)
> > >  {
> > > --
> > > 1.9.1
> > > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to