Re: [Intel-gfx] [PATCH 1/2] drm/i915: Add support for LMEM PCIe resizable bar

2022-06-23 Thread Dandamudi, Priyanka


> -Original Message-
> From: Christian König 
> Sent: 18 June 2022 08:45 PM
> To: De Marchi, Lucas ; Bjorn Helgaas
> 
> Cc: linux-...@vger.kernel.org; intel-gfx@lists.freedesktop.org; Sergei
> Miroshnichenko ; linux-
> ker...@vger.kernel.org; Dandamudi, Priyanka
> ; Auld, Matthew
> ; Bjorn Helgaas 
> Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: Add support for LMEM PCIe
> resizable bar
> 
> Am 17.06.22 um 23:27 schrieb Lucas De Marchi:
> > On Fri, Jun 17, 2022 at 03:32:52PM -0500, Bjorn Helgaas wrote:
> >> [+cc Christian, author of pci_resize_resource(), Sergei, author of
> >> rebalancing patches]
> >>
> >> Hi Lucas,
> >>
> >> On Fri, Jun 17, 2022 at 11:44:41AM -0700, Lucas De Marchi wrote:
> >>> Cc'ing intel-pci, lkml, Bjorn
> >>>
> >>> On Fri, Jun 17, 2022 at 11:32:37AM +0300, Jani Nikula wrote:
> >>> > On Thu, 16 Jun 2022, priyanka.dandam...@intel.com wrote:
> >>> > > From: Akeem G Abodunrin 
> >>> > >
> >>> > > Add support for the local memory PICe resizable bar, so that
> >>> > > local memory can be resized to the maximum size supported by the
> >>> device,
> >>> > > and mapped correctly to the PCIe memory bar. It is usual that
> >>> > > GPU devices expose only 256MB BARs primarily to be compatible
> >>> > > with
> >>> 32-bit
> >>> > > systems. So, those devices cannot claim larger memory BAR
> >>> windows size due
> >>> > > to the system BIOS limitation. With this change, it would be
> >>> possible to
> >>> > > reprogram the windows of the bridge directly above the
> >>> requesting device
> >>> > > on the same BAR type.
> >>>
> >>> There is a big caveat here that this may be too late as other
> >>> drivers may have already mapped their BARs - so probably too late in
> >>> the pci scan for it to be effective. In fact, after using this for a
> >>> while, it seems to fail too often, particularly on CFL systems.
> >>
> >> Help me understand the "too late" part.  Do you mean that there is
> >> enough available space for the max BAR size, but it's fragmented and
> >> therefore not usable?  And that if we could do something earlier,
> >> before drivers have claimed their devices, we might be able to
> >> compact the BARs of other devices to make a larger contiguous available
> space?
> >
> > yes. I will dig some logs I had in the past to confirm.
> >
> >
> >> That is theoretically possible, but I think the current
> >> pci_resize_resource() only supports resizing of the specified BAR and
> >> any upstream bridge windows.  I don't think it supports moving BARs
> >> of other devices.
> >>
> >> Sergei did some nice work that might help with this situation because
> >> it can move BARs around more generally.  It hasn't quite achieved
> >> critical mass yet, but maybe this would help get there:
> >>
> >>
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor
> >> e.kernel.org%2Flinux-pci%2F20201218174011.340514-1-
> s.miroshnichenko%4
> >>
> 0yadro.com%2F&data=05%7C01%7Cchristian.koenig%40amd.com%7C8
> 096027
> >>
> f68484d0656b108da50a82e7d%7C3dd8961fe4884e608e11a82d994e183d%7C
> 0%7C0%
> >>
> 7C637910980509199388%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
> wMDAiLCJQ
> >>
> IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> sdata=
> >>
> %2FfntE2FTQ8wmLnz4wnzk94R0GMLEwVs7Mj18%2B9Q6PJk%3D&reser
> ved=0
> >>
> >
> > oh... I hadn't thought about pause/ioremap/unpause. That looks rad :).
> > So it seems this would integrate neatly with
> > pci_resize_resource() (what this patch is doing), as long as drivers
> > for devices affected implement
> > .bar_fixed()/.rescan_prepare()/.rescan_done(). That seems it would
> > solve our issues too.
> 
> Well we never ran into any of the issues you describe with PCIe BAR resize
> for GPUs so there must be something you do differently to AMD hardware
> regarding this.
> 
> Additional to that keep in mind that you can't resize the BAR before kicking
> out vgacon/efifb or otherwise it can happen that the just released 256MiB
> window is still used while you try to resize it. When you do that you usually
> end up with a hard lockup of the system.
> 
> Regards,
>

Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 04/11] tests/i915/i915_hangman: Explicitly test per engine reset vs full GPU reset

2021-12-21 Thread Dandamudi, Priyanka
Does this test series cover to prove that it can survive killing one without 
killing all the others except RCS+CCS combination(killing one affects other and 
shows with the help of reset stats)?


-Original Message-
From: igt-dev  On Behalf Of 
john.c.harri...@intel.com
Sent: 14 December 2021 04:59 AM
To: igt-...@lists.freedesktop.org
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: [igt-dev] [PATCH i-g-t 04/11] tests/i915/i915_hangman: Explicitly test 
per engine reset vs full GPU reset

From: John Harrison 

Although the hangman test was ensuring that *some* reset functionality was 
enabled, it did not differentiate what kind. The infrastructure required to 
choose between per engine reset or full GT reset was recently added. So update 
this test to use it as well.

Signed-off-by: John Harrison 
---
 tests/i915/i915_hangman.c | 77 +--
 1 file changed, 50 insertions(+), 27 deletions(-)

diff --git a/tests/i915/i915_hangman.c b/tests/i915/i915_hangman.c index 
bd787d7b4..f8a48337c 100644
--- a/tests/i915/i915_hangman.c
+++ b/tests/i915/i915_hangman.c
@@ -323,40 +323,26 @@ static void hangcheck_unterminated(const intel_ctx_t *ctx)
}
 }
 
-igt_main
+static void do_tests(const char *name, const char *prefix,
+const intel_ctx_t *ctx)
 {
const struct intel_execution_engine2 *e;
-   const intel_ctx_t *ctx;
-   igt_hang_t hang = {};
-
-   igt_fixture {
-   device = drm_open_driver(DRIVER_INTEL);
-   igt_require_gem(device);
-
-   ctx = intel_ctx_create_all_physical(device);
-
-   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
-
-   sysfs = igt_sysfs_open(device);
-   igt_assert(sysfs != -1);
+   char buff[256];
 
-   igt_require(has_error_state(sysfs));
-   }
-
-   igt_describe("Basic error capture");
-   igt_subtest("error-state-basic")
-   test_error_state_basic();
-
-   igt_describe("Per engine error capture");
-   igt_subtest_with_dynamic("error-state-capture") {
+   snprintf(buff, sizeof(buff), "Per engine error capture (%s reset)", 
name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-error-state-capture", prefix);
+   igt_subtest_with_dynamic(buff) {
for_each_ctx_engine(device, ctx, e) {
igt_dynamic_f("%s", e->name)
test_error_state_capture(ctx, e);
}
}
 
-   igt_describe("Per engine hang recovery (spin)");
-   igt_subtest_with_dynamic("engine-hang") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (spin, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-hang", prefix);
+   igt_subtest_with_dynamic(buff) {
 int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -374,8 +360,10 @@ igt_main
}
}
 
-   igt_describe("Per engine hang recovery (invalid CS)");
-   igt_subtest_with_dynamic("engine-error") {
+   snprintf(buff, sizeof(buff), "Per engine hang recovery (invalid CS, %s 
reset)", name);
+   igt_describe(buff);
+   snprintf(buff, sizeof(buff), "%s-engine-error", prefix);
+   igt_subtest_with_dynamic(buff) {
int has_gpu_reset = 0;
struct drm_i915_getparam gp = {
.param = I915_PARAM_HAS_GPU_RESET,
@@ -391,11 +379,46 @@ igt_main
test_engine_hang(ctx, e, IGT_SPIN_INVALID_CS);
}
}
+}
+
+igt_main
+{
+   const intel_ctx_t *ctx;
+   igt_hang_t hang = {};
+
+   igt_fixture {
+   device = drm_open_driver(DRIVER_INTEL);
+   igt_require_gem(device);
+
+   ctx = intel_ctx_create_all_physical(device);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE);
+
+   sysfs = igt_sysfs_open(device);
+   igt_assert(sysfs != -1);
+
+   igt_require(has_error_state(sysfs));
+   }
+
+   igt_describe("Basic error capture");
+   igt_subtest("error-state-basic")
+   test_error_state_basic();
+
 
igt_describe("Check that executing unintialised memory causes a hang");
igt_subtest("hangcheck-unterminated")
hangcheck_unterminated(ctx);
 
+   do_tests("GT", "gt", ctx);
+
+   igt_fixture {
+   igt_disallow_hang(device, hang);
+
+   hang = igt_allow_hang(device, ctx->id, HANG_ALLOW_CAPTURE | 
HANG_WANT_ENGINE_RESET);
+   }
+
+   do_tests("engine", "engine", ctx);
+
igt_fixture {
igt_disallow_hang(device, hang);
intel_ctx_destroy(device, ctx);
--
2.25.1