2015-09-02 17:53 GMT-03:00 ch...@chris-wilson.co.uk <ch...@chris-wilson.co.uk>: > On Wed, Sep 02, 2015 at 08:20:52PM +0000, Zanoni, Paulo R wrote: >> Em Qua, 2015-08-26 às 08:44 +0100, Chris Wilson escreveu: >> > On Tue, Aug 25, 2015 at 07:03:42PM -0300, Paulo Zanoni wrote: >> > > The unclaimed register bit is only triggered when someone touches >> > > the >> > > specified register range. >> > > >> > > For the normal use case (with i915.mmio_debug=0), this commit will >> > > avoid the extra __raw_i915_read32() call for every register outside >> > > the specified range, at the expense of a few additional "if" >> > > statements. >> > > >> > > Cc: Chris Wilson <ch...@chris-wilson.co.uk> >> > > Signed-off-by: Paulo Zanoni <paulo.r.zan...@intel.com> >> > >> > >> > > @@ -612,7 +614,7 @@ hsw_unclaimed_reg_debug(struct drm_i915_private >> > > *dev_priv, u32 reg, bool read, >> > > const char *op = read ? "reading" : "writing to"; >> > > const char *when = before ? "before" : "after"; >> > > >> > > - if (!i915.mmio_debug) >> > > + if (!i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg)) >> > > return; >> > >> > Place the register check on the preceding line so that it is not >> > confused with checking the debug-enabled status. (Also you can argue >> > that reg will be register/cache-hot and so a cheaper test to do first >> > than loading a module parameter.) >> >> That makes sense, I'll do it. >> >> > >> > > if (__raw_i915_read32(dev_priv, FPGA_DBG) & >> > > FPGA_DBG_RM_NOCLAIM) { >> > > @@ -624,11 +626,11 @@ hsw_unclaimed_reg_debug(struct >> > > drm_i915_private *dev_priv, u32 reg, bool read, >> > > } >> > > >> > > static void >> > > -hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv) >> > > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv, u32 >> > > reg) >> > > { >> > > static bool mmio_debug_once = true; >> > > >> > > - if (i915.mmio_debug || !mmio_debug_once) >> > > + if (i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg) || >> > > !mmio_debug_once) >> > > return; >> > >> > The use is wrong here though because you never reviewed my patch to >> > enable the single-shot debugging from the interrupt handler to catch >> > invalid usage from elsewhere. >> >> If you're talking about intel_uncore_check_errors(), I think we can >> just kill it now. I'll submit a patch with the arguments, so we can >> continue this topic there. >> >> Moving back to the main subject: >> >> In the last time you sent the patch to do the unclaimed checks only on >> forcewake code, I started the conversations with the HW guys about the >> range of registers relevant by FPGA_DBG (and CCd you on these >> conversations). My plan was to write this patch at that time, but >> priorities happened and it got postponed :(. I also hoped that maybe >> you would eventually end up writing it and I would just have to review >> it :) >> >> My main argument was that by doing unclaimed register checking only at >> forcewake time we would be running a check that is only relevant to >> display code during non-display code. So my idea is that if we restrict >> the unclaimed check to the actual range of registers that can be caught >> we basically remove unclaimed register checking for most (all?) of the >> performance-sensitive code. >> >> Since we have multiple solutions, I decided to do some experiments. >> First of all, since I was not really seeing hsw_unclaimed_reg_detect() >> on perf, I decided to patch it so it would do the read/write check >> around FPGA_DBG 50 times per call instead of just one. With this, by >> running "perf record glxgears" for a few seconds I could see >> hsw_unclaimed_reg_detect() taking about 4.6% of the total time. > > Something is very suspect with your system if you are not observing much > hsw_unclaimed_reg_check during trivial workloads.
Ok, I should have said "almost not seeing" instead of "not really seeing": my bad. I see about 0.24% on perf with plain drm-intel-nightly. The problem is that any of the next patches reduces this down to 0.01% or 0.02%, so it's hard to compare the optimized patches. By doing 50 reg reads on detect() I can make the numbers of the optimized patches a little bigger, so easier to compare. > >> Then I applied just this patch, and the time went down to 0.13%. I also >> applied your patch on top of it all, and it went up to 0.52% (I guess >> the extra checks at forcewake time cost a little bit). Also notice that >> since I add a "reg" argument to hsw_unclaimed_reg_detect(), I changed >> the forcewake call to use a register in the display range. >> I also tested your patch without my patch, and the measured result was >> about 0.56%. >> >> Now this isn't a super relevant experiment: just glxgears with a >> modified hsw_unclaimed_reg_detect(), but I thought it would be useful >> information, and maybe you could provide me suggestions for better >> workloads. >> >> So I'd like to know if you're ok with proceeding with just this patch >> (considering I implement your suggestions), or if you think we should >> go with just your patch or both or none. > > If you really wanted to, you could combine approaches a check in the > forcewake handler as demonstrated is an order of magnitude less frequent > than the register accesses. The key point is that we *need* the checking > against other users, not just our known register access. Checking our own > access basically amounts to detecting invalid registers, which your > approach more or less erradicates (since it is a flag we add to known good > registers, we may as well just compile it and only turn it on during hw > bringup) and that we have the power well awake. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Paulo Zanoni _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx