On 21/06/2022 20:11, Robert Beckett wrote:


On 21/06/2022 18:37, Patchwork wrote:
*Patch Details*
*Series:*    drm/i915: ttm for stolen (rev5)
*URL:*    https://patchwork.freedesktop.org/series/101396/ <https://patchwork.freedesktop.org/series/101396/>
*State:*    failure
*Details:* https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>


  CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5


    Summary

*FAILURE*

Serious unknown changes coming with Patchwork_101396v5 absolutely need to be
verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI.

External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html


    Participating hosts (40 -> 41)

Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus


    Possible new issues

Here are the unknown changes that may have been introduced in Patchwork_101396v5:


      IGT changes


        Possible regressions

  * igt@i915_selftest@live@reset:
      o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@l...@reset.html>
        -> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@l...@reset.html>


I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally on bat-adlp-6.

Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is for now?

Alternatively, seeing the history of this in

commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low pages (128KiB) of stolen from use"

could this be an indication that maybe the original issue is worse on adlp machines? I have only ever seen page page 135 or 136 clobbered across many runs via trybot, so it looks fairly consistent.
Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages 165/166.

Any history of hitting this in CI without your series? If not, are there some other changes which could explain it? Are you touching the selftest itself?

Hexdump of the clobbered page looks quite complex. Especially POISON_FREE. Any idea how that ends up there?

Btw what is the benefit of converting stolen to start with? It's not much of a backend since it just uses the drm range manager. So quite thin and uneventful. Diffstats for the series also do not look like you end up with much code reduction?

Regards,

Tvrtko

Reply via email to