Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround
On Sat, 21 Mar 2020 16:26:42 -0700, Dixit, Ashutosh wrote: > > On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote: > > > > From: Lionel Landwerlin > > > @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct > > i915_perf_stream *stream) > > */ > > spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > > > hw_tail = stream->perf->ops.oa_hw_tail_read(stream); > > > > hw_tail &= ~(report_size - 1); > > > > @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct > > i915_perf_stream *stream) > > > > now = ktime_get_mono_fast_ns(); > > > > + if (hw_tail == stream->oa_buffer.aging_tail && > > + (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) { > > + /* If the HW tail hasn't move since the last check and the HW > > +* tail has been aging for long enough, declare it the new > > +* tail. > > +*/ > > + stream->oa_buffer.tail = stream->oa_buffer.aging_tail; > > + } else { > > + u32 head, tail; > > > > + /* NB: The head we observe here might effectively be a little > > +* out of date. If a read() is in progress, the head could be > > +* anywhere between this head and stream->oa_buffer.tail. > > +*/ > > + head = stream->oa_buffer.head - gtt_offset; > > > > + hw_tail -= gtt_offset; > > + tail = hw_tail; > > > > + /* Walk the stream backward until we find a report with dword 0 > > +* & 1 not at 0. Since the circular buffer pointers progress by > > +* increments of 64 bytes and that reports can be up to 256 > > +* bytes long, we can't tell whether a report has fully landed > > +* in memory before the first 2 dwords of the following report > > +* have effectively landed. > > +* > > +* This is assuming that the writes of the OA unit land in > > +* memory in the order they were written to. > > +* If not : (╯°□°)╯︵ ┻━┻ > > */ > > + while (OA_TAKEN(tail, head) >= report_size) { > > + u32 previous_tail = (tail - report_size) & > > (OA_BUFFER_SIZE - 1); > > + u32 *report32 = (void *)(stream->oa_buffer.vaddr + > > previous_tail); > > Sorry, this is wrong. This should just be: > > tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); > report32 = (void *)(stream->oa_buffer.vaddr + tail); > > Otherwise when we break out of the loop below tail is still set one > report_size ahead. previous_tail is not needed. (In the previous version of > the patch this used to work out correctly). > > > + > > + /* Head of the report indicated by the HW tail register > > has > > +* indeed landed into memory. > > +*/ > > + if (report32[0] != 0 || report32[1] != 0) > > + break; > > + > > + tail = previous_tail; > > } Actually a couple of further improvements to the loop above are possible. First there is no reason to start at previous_tail, we can just start at the aligned hw_tail itself. Therefore the loop becomes: while (OA_TAKEN(tail, head) >= report_size) { u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail); if (report32[0] != 0 || report32[1] != 0) break; tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); } Further, there is no reason to go back to the head but only to the old tail. Therefore: head = stream->oa_buffer.head - gtt_offset; old_tail = stream->oa_buffer.tail - gtt_offset; hw_tail -= gtt_offset; tail = hw_tail; while (OA_TAKEN(tail, old_tail) >= report_size) { u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail); if (report32[0] != 0 || report32[1] != 0) break; tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); } Please review and see if these two improvements are possible. Thanks! ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround
On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote: > > From: Lionel Landwerlin > > We're about to introduce an options to open the perf stream, giving > the user ability to configure how often it wants the kernel to poll > the OA registers for available data. > > Right now the workaround against the OA tail pointer race condition > requires at least twice the internal kernel polling timer to make any > data available. > > This changes introduce checks on the OA data written into the circular > buffer to make as much data as possible available on the first > iteration of the polling timer. /snip/ > diff --git a/drivers/gpu/drm/i915/i915_perf.c > b/drivers/gpu/drm/i915/i915_perf.c > index 3222f6cd8255..c1429d3acaf9 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -223,26 +223,17 @@ > * > * Although this can be observed explicitly while copying reports to > userspace > * by checking for a zeroed report-id field in tail reports, we want to > account > - * for this earlier, as part of the oa_buffer_check to avoid lots of > redundant > - * read() attempts. > - * > - * In effect we define a tail pointer for reading that lags the real tail > - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough > - * time for the corresponding reports to become visible to the CPU. > - * > - * To manage this we actually track two tail pointers: > - * 1) An 'aging' tail with an associated timestamp that is tracked until we > - * can trust the corresponding data is visible to the CPU; at which point > - * it is considered 'aged'. > - * 2) An 'aged' tail that can be used for read()ing. > - * > - * The two separate pointers let us decouple read()s from tail pointer aging. > - * > - * The tail pointers are checked and updated at a limited rate within a > hrtimer > - * callback (the same callback that is used for delivering EPOLLIN events) > - * > - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which > - * indicates that an updated tail pointer is needed. > + * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of > + * redundant read() attempts. > + * > + * We workaround this issue in oa_buffer_check_unlocked() by reading the > reports > + * in the OA buffer, starting from the tail reported by the HW until we find > 2 > + * consecutive reports with their first 2 dwords of not at 0. Those dwords > are until we find a report with its first 2 dwords not 0 meaning its previous report is completely in memory and ready to be read. > + * also set to 0 once read and the whole buffer is cleared upon OA buffer > + * initialization. The first dword is the reason for this report while the > + * second is the timestamp, making the chances of having those 2 fields at 0 > + * fairly unlikely. A more detailed explanation is available in > + * oa_buffer_check_unlocked(). > @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct > i915_perf_stream *stream) >*/ > spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > hw_tail = stream->perf->ops.oa_hw_tail_read(stream); > > hw_tail &= ~(report_size - 1); > > @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct > i915_perf_stream *stream) > > now = ktime_get_mono_fast_ns(); > > + if (hw_tail == stream->oa_buffer.aging_tail && > +(now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) { > + /* If the HW tail hasn't move since the last check and the HW > + * tail has been aging for long enough, declare it the new > + * tail. > + */ > + stream->oa_buffer.tail = stream->oa_buffer.aging_tail; > + } else { > + u32 head, tail; > > + /* NB: The head we observe here might effectively be a little > + * out of date. If a read() is in progress, the head could be > + * anywhere between this head and stream->oa_buffer.tail. > + */ > + head = stream->oa_buffer.head - gtt_offset; > > + hw_tail -= gtt_offset; > + tail = hw_tail; > > + /* Walk the stream backward until we find a report with dword 0 > + * & 1 not at 0. Since the circular buffer pointers progress by > + * increments of 64 bytes and that reports can be up to 256 > + * bytes long, we can't tell whether a report has fully landed > + * in memory before the first 2 dwords of the following report > + * have effectively landed. > + * > + * This is assuming that the writes of the OA unit land in > + * memory in the order they were written to. > + * If not : (╯°□°)╯︵ ┻━┻ >*/ > - if (hw_tail >= gtt_offset && > - hw_tail < (gtt_offset + OA_BUFFER_SIZE)) { > - stream->oa_buffer.tails[!aged_id
[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts (rev2)
== Series Details == Series: series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts (rev2) URL : https://patchwork.freedesktop.org/series/74938/ State : success == Summary == CI Bug Log - changes from CI_DRM_8167_full -> Patchwork_17043_full Summary --- **WARNING** Minor unknown changes coming with Patchwork_17043_full need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_17043_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_17043_full: ### IGT changes ### Warnings * igt@gem_ctx_persistence@close-replace-race: - shard-apl: [INCOMPLETE][1] ([fdo#103927] / [i915#1402]) -> [DMESG-WARN][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl6/igt@gem_ctx_persiste...@close-replace-race.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-apl2/igt@gem_ctx_persiste...@close-replace-race.html New tests - New tests have been introduced between CI_DRM_8167_full and Patchwork_17043_full: ### New IGT tests (3) ### * igt@gem_mmap@pf-nonblock: - Statuses : 8 pass(s) - Exec time: [0.00, 0.01] s * igt@gem_mmap_gtt@pf-nonblock: - Statuses : 7 pass(s) - Exec time: [0.00, 0.02] s * igt@gem_mmap_wc@pf-nonblock: - Statuses : 8 pass(s) - Exec time: [0.00, 0.01] s Known issues Here are the changes found in Patchwork_17043_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_busy@busy-vcs1: - shard-iclb: [PASS][3] -> [SKIP][4] ([fdo#112080]) +21 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb2/igt@gem_b...@busy-vcs1.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-iclb6/igt@gem_b...@busy-vcs1.html * igt@gem_ctx_isolation@rcs0-s3: - shard-kbl: [PASS][5] -> [DMESG-WARN][6] ([i915#180]) +1 similar issue [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl2/igt@gem_ctx_isolat...@rcs0-s3.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-kbl7/igt@gem_ctx_isolat...@rcs0-s3.html * igt@gem_ctx_shared@exec-single-timeline-bsd: - shard-iclb: [PASS][7] -> [SKIP][8] ([fdo#110841]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb5/igt@gem_ctx_sha...@exec-single-timeline-bsd.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-iclb2/igt@gem_ctx_sha...@exec-single-timeline-bsd.html * igt@gem_exec_balancer@hang: - shard-tglb: [PASS][9] -> [FAIL][10] ([i915#1277]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-tglb6/igt@gem_exec_balan...@hang.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-tglb6/igt@gem_exec_balan...@hang.html * igt@gem_exec_schedule@implicit-read-write-bsd2: - shard-iclb: [PASS][11] -> [SKIP][12] ([fdo#109276] / [i915#677]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb2/igt@gem_exec_sched...@implicit-read-write-bsd2.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-iclb6/igt@gem_exec_sched...@implicit-read-write-bsd2.html * igt@gem_exec_schedule@implicit-write-read-bsd: - shard-iclb: [PASS][13] -> [SKIP][14] ([i915#677]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb8/igt@gem_exec_sched...@implicit-write-read-bsd.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-iclb4/igt@gem_exec_sched...@implicit-write-read-bsd.html * igt@gem_exec_schedule@in-order-bsd: - shard-iclb: [PASS][15] -> [SKIP][16] ([fdo#112146]) +7 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb5/igt@gem_exec_sched...@in-order-bsd.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-iclb2/igt@gem_exec_sched...@in-order-bsd.html * igt@i915_pm_dc@dc6-psr: - shard-skl: [PASS][17] -> [FAIL][18] ([i915#454]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl6/igt@i915_pm...@dc6-psr.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-skl10/igt@i915_pm...@dc6-psr.html * igt@i915_selftest@live@mman: - shard-apl: [PASS][19] -> [INCOMPLETE][20] ([fdo#103927]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl4/igt@i915_selftest@l...@mman.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/shard-apl7/igt@i915_selftest@l...@mman.html * igt@kms_cursor_crc@pipe-a-cursor-suspend: - shard-skl: [PASS][21] -> [INCOMPL
[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts (rev2)
== Series Details == Series: series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts (rev2) URL : https://patchwork.freedesktop.org/series/74938/ State : success == Summary == CI Bug Log - changes from CI_DRM_8167 -> Patchwork_17043 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/index.html Known issues Here are the changes found in Patchwork_17043 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@execlists: - fi-bxt-dsi: [PASS][1] -> [INCOMPLETE][2] ([fdo#103927] / [i915#656]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-bxt-dsi/igt@i915_selftest@l...@execlists.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/fi-bxt-dsi/igt@i915_selftest@l...@execlists.html Possible fixes * igt@kms_chamelium@hdmi-crc-fast: - fi-icl-u2: [FAIL][3] ([fdo#109635] / [i915#217]) -> [PASS][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-icl-u2/igt@kms_chamel...@hdmi-crc-fast.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/fi-icl-u2/igt@kms_chamel...@hdmi-crc-fast.html * igt@kms_chamelium@hdmi-hpd-fast: - fi-kbl-7500u: [FAIL][5] ([fdo#111407]) -> [PASS][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-kbl-7500u/igt@kms_chamel...@hdmi-hpd-fast.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/fi-kbl-7500u/igt@kms_chamel...@hdmi-hpd-fast.html Warnings * igt@gem_exec_suspend@basic-s4-devices: - fi-tgl-y: [FAIL][7] ([CI#94]) -> [INCOMPLETE][8] ([CI#94] / [i915#460]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-tgl-y/igt@gem_exec_susp...@basic-s4-devices.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/fi-tgl-y/igt@gem_exec_susp...@basic-s4-devices.html [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94 [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927 [fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407 [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217 [i915#460]: https://gitlab.freedesktop.org/drm/intel/issues/460 [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656 Participating hosts (49 -> 42) -- Additional (1): fi-kbl-7560u Missing(8): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-pnv-d510 fi-byt-clapper fi-bdw-samus Build changes - * CI: CI-20190529 -> None * Linux: CI_DRM_8167 -> Patchwork_17043 CI-20190529: 20190529 CI_DRM_8167: b51a7e7f4f72cf780661a1e4b479e2b27ddbafc8 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5526: f49ebeee9f54d6f23c60a842f75f65561d452ab0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_17043: bed27875a49f92e4f6f2c7d301067b7d3cb3e86c @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == bed27875a49f drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 1aa7fb03cf0c drm/i915: Rely on direct submission to the queue a3349fcb8e49 drm/i915/gt: Delay release of engine-pm after last retirement 381c9adb228f drm/i915: Extend intel_wakeref to support delayed puts == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17043/index.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts
== Series Details == Series: series starting with [1/4] drm/i915: Extend intel_wakeref to support delayed puts URL : https://patchwork.freedesktop.org/series/74938/ State : failure == Summary == CI Bug Log - changes from CI_DRM_8167 -> Patchwork_17042 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_17042 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_17042, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/index.html Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_17042: ### IGT changes ### Possible regressions * igt@i915_selftest@live@mman: - fi-icl-dsi: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-icl-dsi/igt@i915_selftest@l...@mman.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/fi-icl-dsi/igt@i915_selftest@l...@mman.html Known issues Here are the changes found in Patchwork_17042 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@gem_contexts: - fi-cml-s: [PASS][3] -> [DMESG-FAIL][4] ([i915#877]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-cml-s/igt@i915_selftest@live@gem_contexts.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/fi-cml-s/igt@i915_selftest@live@gem_contexts.html - fi-cfl-guc: [PASS][5] -> [INCOMPLETE][6] ([fdo#106070] / [i915#424]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html Possible fixes * igt@gem_exec_suspend@basic-s4-devices: - fi-tgl-y: [FAIL][7] ([CI#94]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-tgl-y/igt@gem_exec_susp...@basic-s4-devices.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/fi-tgl-y/igt@gem_exec_susp...@basic-s4-devices.html * igt@kms_chamelium@hdmi-crc-fast: - fi-icl-u2: [FAIL][9] ([fdo#109635] / [i915#217]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-icl-u2/igt@kms_chamel...@hdmi-crc-fast.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/fi-icl-u2/igt@kms_chamel...@hdmi-crc-fast.html [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94 [fdo#106070]: https://bugs.freedesktop.org/show_bug.cgi?id=106070 [fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217 [i915#424]: https://gitlab.freedesktop.org/drm/intel/issues/424 [i915#877]: https://gitlab.freedesktop.org/drm/intel/issues/877 Participating hosts (49 -> 36) -- Missing(13): fi-ilk-m540 fi-hsw-4200u fi-byt-j1900 fi-skl-guc fi-byt-squawks fi-bsw-cyan fi-kbl-7500u fi-ctg-p8600 fi-gdg-551 fi-cfl-8109u fi-byt-clapper fi-bsw-nick fi-bdw-samus Build changes - * CI: CI-20190529 -> None * Linux: CI_DRM_8167 -> Patchwork_17042 CI-20190529: 20190529 CI_DRM_8167: b51a7e7f4f72cf780661a1e4b479e2b27ddbafc8 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5526: f49ebeee9f54d6f23c60a842f75f65561d452ab0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_17042: ad93345a88ebe797bf424c9cd6a32375228d30a3 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == ad93345a88eb drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 87403a98caa1 drm/i915: Rely on direct submission to the queue a4eb01fe1100 drm/i915/gt: Delay release of engine-pm after last retirement ba10302d1256 drm/i915: Extend intel_wakeref to support delayed puts == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17042/index.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 4/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
We dropped calling process_csb prior to handling direct submission in order to avoid the nesting of spinlocks and lift process_csb() and the majority of the tasklet out of irq-off. However, we do want to avoid ksoftirqd latency in the fast path, so try and pull the interrupt-bh local to direct submission if we can acquire the tasklet's lock. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 210f60e14ef4..82dee2141b46 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) if (reset_in_progress(execlists)) return; /* defer until we restart the engine following reset */ + /* Hopefully we clear execlists->pending[] to let us through */ + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) { + process_csb(engine); + tasklet_unlock(&execlists->tasklet); + } + __execlists_submission_tasklet(engine); } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/4] drm/i915: Extend intel_wakeref to support delayed puts
In some cases we want to hold onto the wakeref for a little after the last user so that we can avoid having to drop and then immediately reacquire it. Allow the last user to specify if they would like to keep the wakeref alive for a short hysteresis. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_pm.h | 6 ++ drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +- drivers/gpu/drm/i915/intel_wakeref.c| 11 ++- drivers/gpu/drm/i915/intel_wakeref.h| 10 -- 4 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h index e52c2b0cb245..418df0a13145 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h @@ -37,6 +37,12 @@ static inline void intel_engine_pm_put_async(struct intel_engine_cs *engine) intel_wakeref_put_async(&engine->wakeref); } +static inline void intel_engine_pm_put_delay(struct intel_engine_cs *engine, +unsigned long delay) +{ + intel_wakeref_put_delay(&engine->wakeref, delay); +} + static inline void intel_engine_pm_flush(struct intel_engine_cs *engine) { intel_wakeref_unlock_wait(&engine->wakeref); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 24c99d0838af..835ec184763e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -38,7 +38,7 @@ static bool flush_submission(struct intel_gt *gt) for_each_engine(engine, gt, id) { intel_engine_flush_submission(engine); active |= flush_work(&engine->retire_work); - active |= flush_work(&engine->wakeref.work); + active |= flush_delayed_work(&engine->wakeref.work); } return active; diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c index 8fbf6f4d3f26..2977bc0428e2 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.c +++ b/drivers/gpu/drm/i915/intel_wakeref.c @@ -70,11 +70,11 @@ static void intel_wakeref_put_last(struct intel_wakeref *wf) void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags) { - INTEL_WAKEREF_BUG_ON(work_pending(&wf->work)); + INTEL_WAKEREF_BUG_ON(delayed_work_pending(&wf->work)); /* Assume we are not in process context and so cannot sleep. */ if (flags & INTEL_WAKEREF_PUT_ASYNC || !mutex_trylock(&wf->mutex)) { - schedule_work(&wf->work); + mod_delayed_work(system_wq, &wf->work, flags >> 1); return; } @@ -83,7 +83,7 @@ void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags) static void __intel_wakeref_put_work(struct work_struct *wrk) { - struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work); + struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work.work); if (atomic_add_unless(&wf->count, -1, 1)) return; @@ -104,8 +104,9 @@ void __intel_wakeref_init(struct intel_wakeref *wf, atomic_set(&wf->count, 0); wf->wakeref = 0; - INIT_WORK(&wf->work, __intel_wakeref_put_work); - lockdep_init_map(&wf->work.lockdep_map, "wakeref.work", &key->work, 0); + INIT_DELAYED_WORK(&wf->work, __intel_wakeref_put_work); + lockdep_init_map(&wf->work.work.lockdep_map, +"wakeref.work", &key->work, 0); } int intel_wakeref_wait_for_idle(struct intel_wakeref *wf) diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h index 7d1e676b71ef..e87532e282d2 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.h +++ b/drivers/gpu/drm/i915/intel_wakeref.h @@ -41,7 +41,7 @@ struct intel_wakeref { struct intel_runtime_pm *rpm; const struct intel_wakeref_ops *ops; - struct work_struct work; + struct delayed_work work; }; struct intel_wakeref_lockclass { @@ -154,6 +154,12 @@ intel_wakeref_put_async(struct intel_wakeref *wf) __intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC); } +static inline void +intel_wakeref_put_delay(struct intel_wakeref *wf, unsigned long delay) +{ + __intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC | delay << 1); +} + /** * intel_wakeref_lock: Lock the wakeref (mutex) * @wf: the wakeref @@ -194,7 +200,7 @@ intel_wakeref_unlock_wait(struct intel_wakeref *wf) { mutex_lock(&wf->mutex); mutex_unlock(&wf->mutex); - flush_work(&wf->work); + flush_delayed_work(&wf->work); } /** -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/4] drm/i915/gt: Delay release of engine-pm after last retirement
Keep the engine-pm awake until the next jiffie, to avoid immediate ping-pong under moderate load. (Forcing the idle barrier excerbates the moderate load, dramatically increasing the driver overhead.) On the other hand, delaying the idle-barrier slightly incurs longer rc6-off and so more power consumption. Closes: https://gitlab.freedesktop.org/drm/intel/issues/848 Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/i915_active.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7132bf616cc4..be4bb62425ca 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -350,7 +350,7 @@ void intel_context_enter_engine(struct intel_context *ce) void intel_context_exit_engine(struct intel_context *ce) { intel_timeline_exit(ce->timeline); - intel_engine_pm_put(ce->engine); + intel_engine_pm_put_delay(ce->engine, 1); } int intel_context_prepare_remote_request(struct intel_context *ce, diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index c4048628188a..a0d31f7bfb42 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -818,7 +818,7 @@ void i915_active_acquire_barrier(struct i915_active *ref) GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); llist_add(barrier_to_ll(node), &engine->barrier_tasks); - intel_engine_pm_put(engine); + intel_engine_pm_put_delay(engine, 1); } } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/4] drm/i915: Rely on direct submission to the queue
Drop the pretense of kicking the tasklet (used only for the defunct guc submission backend, it should just take ownership of the submit!) and so remove the bh-kicking from around submission. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 -- drivers/gpu/drm/i915/gt/intel_lrc.c| 5 + drivers/gpu/drm/i915/i915_request.c| 2 -- 3 files changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 36d069504836..c2bd5accde0c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2348,9 +2348,7 @@ static void eb_request_add(struct i915_execbuffer *eb) __i915_request_skip(rq); } - local_bh_disable(); __i915_request_queue(rq, &attr); - local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ /* Try to clean up the client's timeline after submitting the request */ if (prev) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index f09dd87324b9..210f60e14ef4 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2891,10 +2891,7 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) if (reset_in_progress(execlists)) return; /* defer until we restart the engine following reset */ - if (execlists->tasklet.func == execlists_submission_tasklet) - __execlists_submission_tasklet(engine); - else - tasklet_hi_schedule(&execlists->tasklet); + __execlists_submission_tasklet(engine); } static void submit_queue(struct intel_engine_cs *engine, diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c0df71d7d0ff..3388c5b610c5 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1444,9 +1444,7 @@ void i915_request_add(struct i915_request *rq) if (list_empty(&rq->sched.signalers_list)) attr.priority |= I915_PRIORITY_WAIT; - local_bh_disable(); __i915_request_queue(rq, &attr); - local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ mutex_unlock(&tl->mutex); } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
Quoting Chris Wilson (2020-03-20 17:47:45) > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. > > v2: Tweak the balance to avoid over submitting lite-restores Sigh. It's a driver problem with lock contention, mostly from aggressive idling. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v10 00/12] Convert PWM period and duty cycle to u64
This is a giant CC list. There was one version where you CC'd me on patch 6/12 but after that you just CC'd me on the cover page. Something is messed up in your scripts because Cc'ing me on just the cover is pointless. regards, dan carpenter ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer
On Thu, 19 Mar 2020 15:52:03 -0700, Umesh Nerlige Ramappa wrote: > > From: Lionel Landwerlin > > This new parameter let's the application choose how often the OA > buffer should be checked on the CPU side for data availability. Longer > polling period tend to reduce CPU overhead if the application does not > care about somewhat real time data collection. /snip/ > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index db649d03ab52..c781b9f31b3c 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1980,6 +1980,19 @@ enum drm_i915_perf_property_id { >*/ > DRM_I915_PERF_PROP_GLOBAL_SSEU, > > + /** > + * This optional parameter specifies the timer interval in nanoseconds > + * at which the i915 driver will check the OA buffer for available data. > + * Minimum allowed value is 100 microseconds. A default value is used by > + * the driver if this parameter is not specified. Note that a large > + * value may reduce cpu consumption during OA perf captures, but it > + * would also potentially result in OA buffer overwrite as the captures > + * reach end of the OA buffer. I would like to change the wording here a bit because the wording above seems to say that the /same/ timer period which reduces cpu consumption will also potentially result in OA buffer overwrites. This is misleading and not true. Instead we should say something like: "Note that larger timer values will reduce cpu consumption during OA perf captures. However, excessively large values would potentially result in OA buffer overwrites as captures reach end of the OA buffer". Maybe some guidance to selecting the value, say based on sampling rate and OA buffer size, could also be added here but at the minimum we should fix up the wording as indicated above before pushing. Otherwise: Reviewed-by: Ashutosh Dixit > + * > + * This property is available in perf revision 5. > + */ > + DRM_I915_PERF_PROP_POLL_OA_PERIOD, > + > DRM_I915_PERF_PROP_MAX /* non-ABI */ > }; > > -- > 2.20.1 > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx