Re: [PATCH 4/4] drm/i915/uapi: reject set_domain for discrete
On Mon, Jul 19, 2021 at 4:10 AM Matthew Auld wrote: > > On Fri, 16 Jul 2021 at 16:23, Jason Ekstrand wrote: > > > > On Fri, Jul 16, 2021 at 9:52 AM Tvrtko Ursulin > > wrote: > > > > > > > > > On 15/07/2021 11:15, Matthew Auld wrote: > > > > The CPU domain should be static for discrete, and on DG1 we don't need > > > > any flushing since everything is already coherent, so really all this > > > > does is an object wait, for which we have an ioctl. Longer term the > > > > desired caching should be an immutable creation time property for the > > > > BO, which can be set with something like gem_create_ext. > > > > > > > > One other user is iris + userptr, which uses the set_domain to probe all > > > > the pages to check if the GUP succeeds, however we now have a PROBE > > > > flag for this purpose. > > > > > > > > v2: add some more kernel doc, also add the implicit rules with caching > > > > > > > > Suggested-by: Daniel Vetter > > > > Signed-off-by: Matthew Auld > > > > Cc: Thomas Hellström > > > > Cc: Maarten Lankhorst > > > > Cc: Tvrtko Ursulin > > > > Cc: Jordan Justen > > > > Cc: Kenneth Graunke > > > > Cc: Jason Ekstrand > > > > Cc: Daniel Vetter > > > > Cc: Ramalingam C > > > > Reviewed-by: Ramalingam C > > > > --- > > > > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ > > > > include/uapi/drm/i915_drm.h| 19 +++ > > > > 2 files changed, 22 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > > > b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > > > index 43004bef55cb..b684a62bf3b0 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > > > @@ -490,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, > > > > void *data, > > > > u32 write_domain = args->write_domain; > > > > int err; > > > > > > > > + if (IS_DGFX(to_i915(dev))) > > > > + return -ENODEV; > > > > + > > > > /* Only handle setting domains to types used by the CPU. */ > > > > if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS) > > > > return -EINVAL; > > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > > > > index 2e4112bf4d38..04ce310e7ee6 100644 > > > > --- a/include/uapi/drm/i915_drm.h > > > > +++ b/include/uapi/drm/i915_drm.h > > > > @@ -901,6 +901,25 @@ struct drm_i915_gem_mmap_offset { > > > >* - I915_GEM_DOMAIN_GTT: Mappable aperture domain > > > >* > > > >* All other domains are rejected. > > > > + * > > > > + * Note that for discrete, starting from DG1, this is no longer > > > > supported, and > > > > + * is instead rejected. On such platforms the CPU domain is > > > > effectively static, > > > > + * where we also only support a single &drm_i915_gem_mmap_offset cache > > > > mode, > > > > + * which can't be set explicitly and instead depends on the object > > > > placements, > > > > + * as per the below. > > > > + * > > > > + * Implicit caching rules, starting from DG1: > > > > + * > > > > + * - If any of the object placements (see > > > > &drm_i915_gem_create_ext_memory_regions) > > > > + * contain I915_MEMORY_CLASS_DEVICE then the object will be > > > > allocated and > > > > + * mapped as write-combined only. > > > > Is this accurate? I thought they got WB when living in SMEM and WC > > when on the device. But, since both are coherent, it's safe to lie to > > userspace and say it's all WC. Is that correct or am I missing > > something? > > Yes, it's accurate, it will be allocated and mapped as WC. I think we > can just make select_tt_caching always return cached if we want, and > it looks like ttm seems to be fine with having different caching > values for the tt vs io resource. Daniel, should we adjust this? Mildly related, we had an issue some time back with i915+amdgpu where we were choosing different caching settings for SMEM shared BOs and the fallout was that we had all so
Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()
On Tue, Jul 20, 2021 at 3:25 AM Tvrtko Ursulin wrote: > > > On 19/07/2021 19:30, Jason Ekstrand wrote: > > If the driver was not fully loaded, we may still have globals lying > > around. If we don't tear those down in i915_exit(), we'll leak a bunch > > of memory slabs. This can happen two ways: use_kms = false and if we've > > run mock selftests. In either case, we have an early exit from > > i915_init which happens after i915_globals_init() and we need to clean > > up those globals. While we're here, add an explicit boolean instead of > > using a random field from i915_pci_device to detect partial loads. > > > > The mock selftests case gets especially sticky. The load isn't entirely > > a no-op. We actually do quite a bit inside those selftests including > > allocating a bunch of mock objects and running tests on them. Once all > > those tests are complete, we exit early from i915_init(). Perviously, > > i915_init() would return a non-zero error code on failure and a zero > > error code on success. In the success case, we would get to i915_exit() > > and check i915_pci_driver.driver.owner to detect if i915_init exited early > > and do nothing. In the failure case, we would fail i915_init() but > > there would be no opportunity to clean up globals. > > > > The most annoying part is that you don't actually notice the failure as > > part of the self-tests since leaking a bit of memory, while bad, doesn't > > result in anything observable from userspace. Instead, the next time we > > load the driver (usually for next IGT test), i915_globals_init() gets > > invoked again, we go to allocate a bunch of new memory slabs, those > > implicitly create debugfs entries, and debugfs warns that we're trying > > to create directories and files that already exist. Since this all > > happens as part of the next driver load, it shows up in the dmesg-warn > > of whatever IGT test ran after the mock selftests. > > Story checks out but I totally don't get why it wouldn't be noticed > until now. Was it perhaps part of the selfetsts contract that a reboot > is required after failure? No. They do unload the driver, though. They just don't re-load it. > > While the obvious thing to do here might be to call i915_globals_exit() > > after selftests, that's not actually safe. The dma-buf selftests call > > i915_gem_prime_export which creates a file. We call dma_buf_put() on > > the resulting dmabuf which calls fput() on the file. However, fput() > > isn't immediate and gets flushed right before syscall returns. This > > means that all the fput()s from the selftests don't happen until right > > before the module load syscall used to fire off the selftests returns > > which is after i915_init(). If we call i915_globals_exit() in > > i915_init() after selftests, we end up freeing slabs out from under > > objects which won't get released until fput() is flushed at the end of > > the module load. > > Nasty. Wasn't visible while globals memory leak was "in place". :I > > > The solution here is to let i915_init() return success early and detect > > the early success in i915_exit() and only tear down globals and nothing > > else. This way the module loads successfully, regardless of the success > > or failure of the tests. Because we've not enumerated any PCI devices, > > no device nodes are created and it's entirely useless from userspace. > > The only thing the module does at that point is hold on to a bit of > > memory until we unload it and i915_exit() is called. Importantly, this > > means that everything from our selftests has the ability to properly > > flush out between i915_init() and i915_exit() because there are a couple > > syscall boundaries in between. > > When you say "couple of syscall boundaries" you mean exactly two (module > init/unload) or there is more to it? Like why "couple" is needed and not > just that the module load syscall has exited? That part sounds > potentially dodgy. What mechanism is used by the delayed flush? > > Have you checked how this change interacts with the test runner and CI? By the end of the series, a bunch of tests are fixed. In particular, https://gitlab.freedesktop.org/drm/intel/-/issues/3746 > > > > Signed-off-by: Jason Ekstrand > > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") > > Cc: Daniel Vetter > > --- > > drivers/gpu/drm/i915/i915_pci.c | 32 +--- > > 1 file changed, 25 insertions(+), 7 deletions(-)
Re: [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()
On Tue, Jul 20, 2021 at 9:18 AM Daniel Vetter wrote: > > On Mon, Jul 19, 2021 at 01:30:44PM -0500, Jason Ekstrand wrote: > > If the driver was not fully loaded, we may still have globals lying > > around. If we don't tear those down in i915_exit(), we'll leak a bunch > > of memory slabs. This can happen two ways: use_kms = false and if we've > > run mock selftests. In either case, we have an early exit from > > i915_init which happens after i915_globals_init() and we need to clean > > up those globals. While we're here, add an explicit boolean instead of > > using a random field from i915_pci_device to detect partial loads. > > > > The mock selftests case gets especially sticky. The load isn't entirely > > a no-op. We actually do quite a bit inside those selftests including > > allocating a bunch of mock objects and running tests on them. Once all > > those tests are complete, we exit early from i915_init(). Perviously, > > i915_init() would return a non-zero error code on failure and a zero > > error code on success. In the success case, we would get to i915_exit() > > and check i915_pci_driver.driver.owner to detect if i915_init exited early > > and do nothing. In the failure case, we would fail i915_init() but > > there would be no opportunity to clean up globals. > > > > The most annoying part is that you don't actually notice the failure as > > part of the self-tests since leaking a bit of memory, while bad, doesn't > > result in anything observable from userspace. Instead, the next time we > > load the driver (usually for next IGT test), i915_globals_init() gets > > invoked again, we go to allocate a bunch of new memory slabs, those > > implicitly create debugfs entries, and debugfs warns that we're trying > > to create directories and files that already exist. Since this all > > happens as part of the next driver load, it shows up in the dmesg-warn > > of whatever IGT test ran after the mock selftests. > > > > While the obvious thing to do here might be to call i915_globals_exit() > > after selftests, that's not actually safe. The dma-buf selftests call > > i915_gem_prime_export which creates a file. We call dma_buf_put() on > > the resulting dmabuf which calls fput() on the file. However, fput() > > isn't immediate and gets flushed right before syscall returns. This > > means that all the fput()s from the selftests don't happen until right > > before the module load syscall used to fire off the selftests returns > > which is after i915_init(). If we call i915_globals_exit() in > > i915_init() after selftests, we end up freeing slabs out from under > > objects which won't get released until fput() is flushed at the end of > > the module load. > > > > The solution here is to let i915_init() return success early and detect > > the early success in i915_exit() and only tear down globals and nothing > > else. This way the module loads successfully, regardless of the success > > or failure of the tests. Because we've not enumerated any PCI devices, > > no device nodes are created and it's entirely useless from userspace. > > The only thing the module does at that point is hold on to a bit of > > memory until we unload it and i915_exit() is called. Importantly, this > > means that everything from our selftests has the ability to properly > > flush out between i915_init() and i915_exit() because there are a couple > > syscall boundaries in between. > > > > Signed-off-by: Jason Ekstrand > > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") > > Cc: Daniel Vetter > > --- > > drivers/gpu/drm/i915/i915_pci.c | 32 +--- > > 1 file changed, 25 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_pci.c > > b/drivers/gpu/drm/i915/i915_pci.c > > index 4e627b57d31a2..24e4e54516936 100644 > > --- a/drivers/gpu/drm/i915/i915_pci.c > > +++ b/drivers/gpu/drm/i915/i915_pci.c > > @@ -1194,18 +1194,31 @@ static struct pci_driver i915_pci_driver = { > > .driver.pm = &i915_pm_ops, > > }; > > > > +static bool i915_fully_loaded = false; > > + > > static int __init i915_init(void) > > { > > bool use_kms = true; > > int err; > > > > + i915_fully_loaded = false; > > + > > err = i915_globals_init(); > > if (err) > > return err; > > > > + /* i915_mock_selftests() only returns zero if no mock subtes
Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()
Sorry... didn't reply to everything the first time On Tue, Jul 20, 2021 at 3:25 AM Tvrtko Ursulin wrote: > > > On 19/07/2021 19:30, Jason Ekstrand wrote: > > If the driver was not fully loaded, we may still have globals lying > > around. If we don't tear those down in i915_exit(), we'll leak a bunch > > of memory slabs. This can happen two ways: use_kms = false and if we've > > run mock selftests. In either case, we have an early exit from > > i915_init which happens after i915_globals_init() and we need to clean > > up those globals. While we're here, add an explicit boolean instead of > > using a random field from i915_pci_device to detect partial loads. > > > > The mock selftests case gets especially sticky. The load isn't entirely > > a no-op. We actually do quite a bit inside those selftests including > > allocating a bunch of mock objects and running tests on them. Once all > > those tests are complete, we exit early from i915_init(). Perviously, > > i915_init() would return a non-zero error code on failure and a zero > > error code on success. In the success case, we would get to i915_exit() > > and check i915_pci_driver.driver.owner to detect if i915_init exited early > > and do nothing. In the failure case, we would fail i915_init() but > > there would be no opportunity to clean up globals. > > > > The most annoying part is that you don't actually notice the failure as > > part of the self-tests since leaking a bit of memory, while bad, doesn't > > result in anything observable from userspace. Instead, the next time we > > load the driver (usually for next IGT test), i915_globals_init() gets > > invoked again, we go to allocate a bunch of new memory slabs, those > > implicitly create debugfs entries, and debugfs warns that we're trying > > to create directories and files that already exist. Since this all > > happens as part of the next driver load, it shows up in the dmesg-warn > > of whatever IGT test ran after the mock selftests. > > Story checks out but I totally don't get why it wouldn't be noticed > until now. Was it perhaps part of the selfetsts contract that a reboot > is required after failure? If there is such a contract, CI doesn't follow it. We unload the driver after selftests but that's it. > > While the obvious thing to do here might be to call i915_globals_exit() > > after selftests, that's not actually safe. The dma-buf selftests call > > i915_gem_prime_export which creates a file. We call dma_buf_put() on > > the resulting dmabuf which calls fput() on the file. However, fput() > > isn't immediate and gets flushed right before syscall returns. This > > means that all the fput()s from the selftests don't happen until right > > before the module load syscall used to fire off the selftests returns > > which is after i915_init(). If we call i915_globals_exit() in > > i915_init() after selftests, we end up freeing slabs out from under > > objects which won't get released until fput() is flushed at the end of > > the module load. > > Nasty. Wasn't visible while globals memory leak was "in place". :I > > > The solution here is to let i915_init() return success early and detect > > the early success in i915_exit() and only tear down globals and nothing > > else. This way the module loads successfully, regardless of the success > > or failure of the tests. Because we've not enumerated any PCI devices, > > no device nodes are created and it's entirely useless from userspace. > > The only thing the module does at that point is hold on to a bit of > > memory until we unload it and i915_exit() is called. Importantly, this > > means that everything from our selftests has the ability to properly > > flush out between i915_init() and i915_exit() because there are a couple > > syscall boundaries in between. > > When you say "couple of syscall boundaries" you mean exactly two (module > init/unload) or there is more to it? Like why "couple" is needed and not > just that the module load syscall has exited? That part sounds > potentially dodgy. What mechanism is used by the delayed flush? It only needs the one syscall. I've changed the text to say "at least one syscall boundary". I think that's more clear without providing an exact count which may not be tractable. > Have you checked how this change interacts with the test runner and CI? As far as I know, there's no interesting interaction here. That said, I did just find that the live selftests fail the modprobe on selftest fa
Re: [PATCH] drm/i915: Check for nomodeset in i915_init() first
On Mon, Jul 19, 2021 at 3:35 AM Daniel Vetter wrote: > > Jason is trying to sort out the unwinding in i915_init and i915_exit, > while reviewing those patches I noticed that we also have the > nomodeset handling now in the middle of things. > > Pull that out for simplisity in unwinding - if you run selftest with > nomodeset you get nothing, *shrug*. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_pci.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 67696d752271..6fe709ac1b4b 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -1199,14 +1199,6 @@ static int __init i915_init(void) > bool use_kms = true; > int err; > > - err = i915_globals_init(); > - if (err) > - return err; > - > - err = i915_mock_selftests(); > - if (err) > - return err > 0 ? 0 : err; > - > /* > * Enable KMS by default, unless explicitly overriden by > * either the i915.modeset prarameter or by the > @@ -1225,6 +1217,14 @@ static int __init i915_init(void) > return 0; > } > > + err = i915_globals_init(); > + if (err) > + return err; > + > + err = i915_mock_selftests(); > + if (err) > + return err > 0 ? 0 : err; > + Annoyingly, this actually makes i915_exit() harder because now we need to conditionals: One for "do you have globals?" and one for "do you have anything at all?". It's actually easier to get right if we have i915_globals_init() /* Everything that can return 0 early */ fully_loaded = true /* Everything that can fail */ > i915_pmu_init(); > > err = pci_register_driver(&i915_pci_driver); > -- > 2.32.0 >
[PATCH 0/6] Fix the debugfs splat from mock selftests
This patch series fixes a miscellaneous collection of bugs that all add up to all our mock selftests throwing dmesg warnings in CI. As can be seen from "drm/i915: Use a table for i915_init/exit", it's especially fun since those warnings don't always show up in the selftests but can show up in other random IGTs depending on test execution order. Jason Ekstrand (6): drm/i915: Call i915_globals_exit() after i915_pmu_exit() drm/i915: Call i915_globals_exit() if pci_register_device() fails drm/i915: Use a table for i915_init/exit drm/ttm: Force re-init if ttm_global_init() fails drm/ttm: Initialize debugfs from ttm_global_init() drm/i915: Make the kmem slab for i915_buddy_block a global drivers/gpu/drm/i915/i915_buddy.c | 44 ++-- drivers/gpu/drm/i915/i915_buddy.h | 3 +- drivers/gpu/drm/i915/i915_globals.c | 6 +- drivers/gpu/drm/i915/i915_pci.c | 103 -- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_perf.h | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 4 +- drivers/gpu/drm/i915/i915_pmu.h | 4 +- .../gpu/drm/i915/selftests/i915_selftest.c| 2 +- drivers/gpu/drm/ttm/ttm_device.c | 14 +++ drivers/gpu/drm/ttm/ttm_module.c | 16 --- 11 files changed, 134 insertions(+), 67 deletions(-) -- 2.31.1
[PATCH 1/6] drm/i915: Call i915_globals_exit() after i915_pmu_exit()
We should tear down in the opposite order we set up. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 67696d7522718..50ed93b03e582 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1244,8 +1244,8 @@ static void __exit i915_exit(void) i915_perf_sysctl_unregister(); pci_unregister_driver(&i915_pci_driver); - i915_globals_exit(); i915_pmu_exit(); + i915_globals_exit(); } module_init(i915_init); -- 2.31.1
[PATCH 2/6] drm/i915: Call i915_globals_exit() if pci_register_device() fails
In the unlikely event that pci_register_device() fails, we were tearing down our PMU setup but not globals. This leaves a bunch of memory slabs lying around. Signed-off-by: Jason Ekstrand Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_globals.c | 4 ++-- drivers/gpu/drm/i915/i915_pci.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 77f1911c463b8..87267e1d2ad92 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -138,7 +138,7 @@ void i915_globals_unpark(void) atomic_inc(&active); } -static void __exit __i915_globals_flush(void) +static void __i915_globals_flush(void) { atomic_inc(&active); /* skip shrinking */ @@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void) atomic_dec(&active); } -void __exit i915_globals_exit(void) +void i915_globals_exit(void) { GEM_BUG_ON(atomic_read(&active)); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 50ed93b03e582..4e627b57d31a2 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1230,6 +1230,7 @@ static int __init i915_init(void) err = pci_register_driver(&i915_pci_driver); if (err) { i915_pmu_exit(); + i915_globals_exit(); return err; } -- 2.31.1
[PATCH 3/6] drm/i915: Use a table for i915_init/exit
If the driver was not fully loaded, we may still have globals lying around. If we don't tear those down in i915_exit(), we'll leak a bunch of memory slabs. This can happen two ways: use_kms = false and if we've run mock selftests. In either case, we have an early exit from i915_init which happens after i915_globals_init() and we need to clean up those globals. The mock selftests case is especially sticky. The load isn't entirely a no-op. We actually do quite a bit inside those selftests including allocating a bunch of mock objects and running tests on them. Once all those tests are complete, we exit early from i915_init(). Perviously, i915_init() would return a non-zero error code on failure and a zero error code on success. In the success case, we would get to i915_exit() and check i915_pci_driver.driver.owner to detect if i915_init exited early and do nothing. In the failure case, we would fail i915_init() but there would be no opportunity to clean up globals. The most annoying part is that you don't actually notice the failure as part of the self-tests since leaking a bit of memory, while bad, doesn't result in anything observable from userspace. Instead, the next time we load the driver (usually for next IGT test), i915_globals_init() gets invoked again, we go to allocate a bunch of new memory slabs, those implicitly create debugfs entries, and debugfs warns that we're trying to create directories and files that already exist. Since this all happens as part of the next driver load, it shows up in the dmesg-warn of whatever IGT test ran after the mock selftests. While the obvious thing to do here might be to call i915_globals_exit() after selftests, that's not actually safe. The dma-buf selftests call i915_gem_prime_export which creates a file. We call dma_buf_put() on the resulting dmabuf which calls fput() on the file. However, fput() isn't immediate and gets flushed right before syscall returns. This means that all the fput()s from the selftests don't happen until right before the module load syscall used to fire off the selftests returns which is after i915_init(). If we call i915_globals_exit() in i915_init() after selftests, we end up freeing slabs out from under objects which won't get released until fput() is flushed at the end of the module load syscall. The solution here is to let i915_init() return success early and detect the early success in i915_exit() and only tear down globals and nothing else. This way the module loads successfully, regardless of the success or failure of the tests. Because we've not enumerated any PCI devices, no device nodes are created and it's entirely useless from userspace. The only thing the module does at that point is hold on to a bit of memory until we unload it and i915_exit() is called. Importantly, this means that everything from our selftests has the ability to properly flush out between i915_init() and i915_exit() because there is at least one syscall boundary in between. In order to handle all the delicate init/exit cases, we convert the whole thing to a table of init/exit pairs and track the init status in the new init_progress global. This allows us to ensure that i915_exit() always tears down exactly the things that i915_init() successfully initialized. We also allow early-exit of i915_init() without failure by an init function returning > 0. This is useful for nomodeset, and selftests. For the mock selftests, we convert them to always return 1 so we get the desired behavior of the driver always succeeding to load the driver and then properly tearing down the partially loaded driver. Signed-off-by: Jason Ekstrand Cc: Daniel Vetter Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pci.c | 104 -- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_perf.h | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 4 +- drivers/gpu/drm/i915/i915_pmu.h | 4 +- .../gpu/drm/i915/selftests/i915_selftest.c| 2 +- 6 files changed, 80 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4e627b57d31a2..64ebd89eae6ce 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev) i915_driver_shutdown(i915); } -static struct pci_driver i915_pci_driver = { - .name = DRIVER_NAME, - .id_table = pciidlist, - .probe = i915_pci_probe, - .remove = i915_pci_remove, - .shutdown = i915_pci_shutdown, - .driver.pm = &i915_pm_ops, -}; - -static int __init i915_init(void) +static int i915_check_nomodeset(void) { bool use_kms = true; - int err; - - err = i915_globals_init(); - if (err) - return err; - - err = i915_moc
[PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails
If we have a failure, decrement the reference count so that the next call to ttm_global_init() will actually do something instead of assume everything is all set up. Signed-off-by: Jason Ekstrand Fixes: 62b53b37e4b1 ("drm/ttm: use a static ttm_bo_global instance") Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 5f31acec3ad76..519deea8e39b7 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -100,6 +100,8 @@ static int ttm_global_init(void) debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, &glob->bo_count); out: + if (ret) + --ttm_glob_use_count; mutex_unlock(&ttm_global_mutex); return ret; } -- 2.31.1
[PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init()
We create a bunch of debugfs entries as a side-effect of ttm_global_init() and then never clean them up. This isn't usually a problem because we free the whole debugfs directory on module unload. However, if the global reference count ever goes to zero and then ttm_global_init() is called again, we'll re-create those debugfs entries and debugfs will complain in dmesg that we're creating entries that already exist. This patch fixes this problem by changing the lifetime of the whole TTM debugfs directory to match that of the TTM global state. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/ttm/ttm_device.c | 12 drivers/gpu/drm/ttm/ttm_module.c | 16 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 519deea8e39b7..74e3b460132b3 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -44,6 +44,8 @@ static unsigned ttm_glob_use_count; struct ttm_global ttm_glob; EXPORT_SYMBOL(ttm_glob); +struct dentry *ttm_debugfs_root; + static void ttm_global_release(void) { struct ttm_global *glob = &ttm_glob; @@ -53,6 +55,7 @@ static void ttm_global_release(void) goto out; ttm_pool_mgr_fini(); + debugfs_remove(ttm_debugfs_root); __free_page(glob->dummy_read_page); memset(glob, 0, sizeof(*glob)); @@ -73,6 +76,13 @@ static int ttm_global_init(void) si_meminfo(&si); + ttm_debugfs_root = debugfs_create_dir("ttm", NULL); + if (IS_ERR(ttm_debugfs_root)) { + ret = PTR_ERR(ttm_debugfs_root); + ttm_debugfs_root = NULL; + goto out; + } + /* Limit the number of pages in the pool to about 50% of the total * system memory. */ @@ -100,6 +110,8 @@ static int ttm_global_init(void) debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, &glob->bo_count); out: + if (ret && ttm_debugfs_root) + debugfs_remove(ttm_debugfs_root); if (ret) --ttm_glob_use_count; mutex_unlock(&ttm_global_mutex); diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c index 997c458f68a9a..7fcdef278c742 100644 --- a/drivers/gpu/drm/ttm/ttm_module.c +++ b/drivers/gpu/drm/ttm/ttm_module.c @@ -72,22 +72,6 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp) return tmp; } -struct dentry *ttm_debugfs_root; - -static int __init ttm_init(void) -{ - ttm_debugfs_root = debugfs_create_dir("ttm", NULL); - return 0; -} - -static void __exit ttm_exit(void) -{ - debugfs_remove(ttm_debugfs_root); -} - -module_init(ttm_init); -module_exit(ttm_exit); - MODULE_AUTHOR("Thomas Hellstrom, Jerome Glisse"); MODULE_DESCRIPTION("TTM memory manager subsystem (for DRM device)"); MODULE_LICENSE("GPL and additional rights"); -- 2.31.1
[PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global
There's no reason that I can tell why this should be per-i915_buddy_mm and doing so causes KMEM_CACHE to throw dmesg warnings because it tries to create a debugfs entry with the name i915_buddy_block multiple times. We could handle this by carefully giving each slab its own name but that brings its own pain because then we have to store that string somewhere and manage the lifetimes of the different slabs. The most likely outcome would be a global atomic which we increment to get a new name or something like that. The much easier solution is to use the i915_globals system like we do for every other slab in i915. This ensures that we have exactly one of them for each i915 driver load and it gets neatly created on module load and destroyed on module unload. Using the globals system also means that its now tied into the shrink handler so we can properly respond to low-memory situations. Signed-off-by: Jason Ekstrand Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man") Cc: Matthew Auld --- drivers/gpu/drm/i915/i915_buddy.c | 44 ++--- drivers/gpu/drm/i915/i915_buddy.h | 3 +- drivers/gpu/drm/i915/i915_globals.c | 2 ++ 3 files changed, 38 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_buddy.c b/drivers/gpu/drm/i915/i915_buddy.c index 29dd7d0310c1f..911feedad4513 100644 --- a/drivers/gpu/drm/i915/i915_buddy.c +++ b/drivers/gpu/drm/i915/i915_buddy.c @@ -8,8 +8,14 @@ #include "i915_buddy.h" #include "i915_gem.h" +#include "i915_globals.h" #include "i915_utils.h" +static struct i915_global_buddy { + struct i915_global base; + struct kmem_cache *slab_blocks; +} global; + static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, struct i915_buddy_block *parent, unsigned int order, @@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER); - block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL); + block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL); if (!block) return NULL; @@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, static void i915_block_free(struct i915_buddy_mm *mm, struct i915_buddy_block *block) { - kmem_cache_free(mm->slab_blocks, block); + kmem_cache_free(global.slab_blocks, block); } static void mark_allocated(struct i915_buddy_block *block) @@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 chunk_size) GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER); - mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN); - if (!mm->slab_blocks) - return -ENOMEM; - mm->free_list = kmalloc_array(mm->max_order + 1, sizeof(struct list_head), GFP_KERNEL); if (!mm->free_list) - goto out_destroy_slab; + return -ENOMEM; for (i = 0; i <= mm->max_order; ++i) INIT_LIST_HEAD(&mm->free_list[i]); @@ -145,8 +147,6 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 chunk_size) kfree(mm->roots); out_free_list: kfree(mm->free_list); -out_destroy_slab: - kmem_cache_destroy(mm->slab_blocks); return -ENOMEM; } @@ -161,7 +161,6 @@ void i915_buddy_fini(struct i915_buddy_mm *mm) kfree(mm->roots); kfree(mm->free_list); - kmem_cache_destroy(mm->slab_blocks); } static int split_block(struct i915_buddy_mm *mm, @@ -410,3 +409,28 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm, #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_buddy.c" #endif + +static void i915_global_buddy_shrink(void) +{ + kmem_cache_shrink(global.slab_blocks); +} + +static void i915_global_buddy_exit(void) +{ + kmem_cache_destroy(global.slab_blocks); +} + +static struct i915_global_buddy global = { { + .shrink = i915_global_buddy_shrink, + .exit = i915_global_buddy_exit, +} }; + +int __init i915_global_buddy_init(void) +{ + global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0); + if (!global.slab_blocks) + return -ENOMEM; + + i915_global_register(&global.base); + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_buddy.h b/drivers/gpu/drm/i915/i915_buddy.h index 37f8c42071d12..d8f26706de52f 100644 --- a/drivers/gpu/drm/i915/i915_buddy.h +++ b/drivers/gpu/drm/i915/i915_buddy.h @@ -47,7 +47,6 @@ struct i915_buddy_block { * i915_buddy_alloc* and i915_buddy_free* should suffice. */ struct i915_buddy_mm { - struct kmem_cach
Re: [Intel-gfx] [PATCH 7/7] drm/i915/gem: Migrate to system at dma-buf attach time (v6)
Fixed all the nits below locally. It'll be in the next send. On Tue, Jul 20, 2021 at 5:53 AM Matthew Auld wrote: > > On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand wrote: > > > > From: Thomas Hellström > > > > Until we support p2p dma or as a complement to that, migrate data > > to system memory at dma-buf attach time if possible. > > > > v2: > > - Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver > > selftest to migrate if we are LMEM capable. > > v3: > > - Migrate also in the pin() callback. > > v4: > > - Migrate in attach > > v5: (jason) > > - Lock around the migration > > v6: (jason) > > - Move the can_migrate check outside the lock > > - Rework the selftests to test more migration conditions. In > > particular, SMEM, LMEM, and LMEM+SMEM are all checked. > > > > Signed-off-by: Thomas Hellström > > Signed-off-by: Michael J. Ruhl > > Reported-by: kernel test robot > > Signed-off-by: Jason Ekstrand > > Reviewed-by: Jason Ekstrand > > --- > > drivers/gpu/drm/i915/gem/i915_gem_create.c| 2 +- > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 - > > drivers/gpu/drm/i915/gem/i915_gem_object.h| 4 + > > .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 89 ++- > > 4 files changed, 112 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > index 039e4f3b39c79..41c4cd3e1ea01 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > @@ -82,7 +82,7 @@ static int i915_gem_publish(struct drm_i915_gem_object > > *obj, > > return 0; > > } > > > > -static struct drm_i915_gem_object * > > +struct drm_i915_gem_object * > > i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, > > struct intel_memory_region **placements, > > unsigned int n_placements) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > index 9a655f69a0671..5d438b95826b9 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > @@ -170,8 +170,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf > > *dmabuf, > > struct dma_buf_attachment *attach) > > { > > struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); > > + struct i915_gem_ww_ctx ww; > > + int err; > > + > > + if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) > > + return -EOPNOTSUPP; > > + > > + for_i915_gem_ww(&ww, err, true) { > > + err = i915_gem_object_lock(obj, &ww); > > + if (err) > > + continue; > > + > > + err = i915_gem_object_migrate(obj, &ww, INTEL_REGION_SMEM); > > + if (err) > > + continue; > > > > - return i915_gem_object_pin_pages_unlocked(obj); > > + err = i915_gem_object_wait_migration(obj, 0); > > + if (err) > > + continue; > > + > > + err = i915_gem_object_pin_pages(obj); > > + } > > + > > + return err; > > } > > > > static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf, > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > index 8be4fadeee487..fbae53bd46384 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > > @@ -61,6 +61,10 @@ i915_gem_object_create_shmem(struct drm_i915_private > > *i915, > > struct drm_i915_gem_object * > > i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, > >const void *data, resource_size_t > > size); > > +struct drm_i915_gem_object * > > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, > > + struct intel_memory_region **placements, > > + unsigned int n_placements); > > > > extern const struct drm_i915_gem_object_ops i915_gem_shmem_ops; > > > > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.
Re: [PATCH 6/7] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v6)
On Tue, Jul 20, 2021 at 4:07 AM Matthew Auld wrote: > > On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand wrote: > > > > From: Thomas Hellström > > > > If our exported dma-bufs are imported by another instance of our driver, > > that instance will typically have the imported dma-bufs locked during > > dma_buf_map_attachment(). But the exporter also locks the same reservation > > object in the map_dma_buf() callback, which leads to recursive locking. > > > > So taking the lock inside _pin_pages_unlocked() is incorrect. > > > > Additionally, the current pinning code path is contrary to the defined > > way that pinning should occur. > > > > Remove the explicit pin/unpin from the map/umap functions and move them > > to the attach/detach allowing correct locking to occur, and to match > > the static dma-buf drm_prime pattern. > > > > Add a live selftest to exercise both dynamic and non-dynamic > > exports. > > > > v2: > > - Extend the selftest with a fake dynamic importer. > > - Provide real pin and unpin callbacks to not abuse the interface. > > v3: (ruhl) > > - Remove the dynamic export support and move the pinning into the > > attach/detach path. > > v4: (ruhl) > > - Put pages does not need to assert on the dma-resv > > v5: (jason) > > - Lock around dma_buf_unmap_attachment() when emulating a dynamic > > importer in the subtests. > > - Use pin_pages_unlocked > > v6: (jason) > > - Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests > > > > Reported-by: Michael J. Ruhl > > Signed-off-by: Thomas Hellström > > Signed-off-by: Michael J. Ruhl > > Signed-off-by: Jason Ekstrand > > Reviewed-by: Jason Ekstrand > > --- > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 43 ++-- > > .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 103 +- > > 2 files changed, 132 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > index 616c3a2f1baf0..9a655f69a0671 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c > > @@ -12,6 +12,8 @@ > > #include "i915_gem_object.h" > > #include "i915_scatterlist.h" > > > > +I915_SELFTEST_DECLARE(static bool force_different_devices;) > > + > > static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf) > > { > > return to_intel_bo(buf->priv); > > @@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct > > dma_buf_attachment *attachme > > struct scatterlist *src, *dst; > > int ret, i; > > > > - ret = i915_gem_object_pin_pages_unlocked(obj); > > - if (ret) > > - goto err; > > - > > /* Copy sg so that we make an independent mapping */ > > st = kmalloc(sizeof(struct sg_table), GFP_KERNEL); > > if (st == NULL) { > > ret = -ENOMEM; > > - goto err_unpin_pages; > > + goto err; > > } > > > > ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); > > @@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct > > dma_buf_attachment *attachme > > sg_free_table(st); > > err_free: > > kfree(st); > > -err_unpin_pages: > > - i915_gem_object_unpin_pages(obj); > > err: > > return ERR_PTR(ret); > > } > > @@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct > > dma_buf_attachment *attachment, > >struct sg_table *sg, > >enum dma_data_direction dir) > > { > > - struct drm_i915_gem_object *obj = > > dma_buf_to_obj(attachment->dmabuf); > > - > > dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC); > > sg_free_table(sg); > > kfree(sg); > > - > > - i915_gem_object_unpin_pages(obj); > > } > > > > static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct > > dma_buf_map *map) > > @@ -168,7 +160,31 @@ static int i915_gem_end_cpu_access(struct dma_buf > > *dma_buf, enum dma_data_direct > > return err; > > } > > > > +/** > > + * i915_gem_dmabuf_attach - Do any extra attach work necessary > > + * @dmabuf: imported dma-buf > > + * @attach: new attach to do
Re: [PATCH 3/7] drm/i915/gem: Unify user object creation
On Tue, Jul 20, 2021 at 4:35 AM Matthew Auld wrote: > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand wrote: > > > > Instead of hand-rolling the same three calls in each function, pull them > > into an i915_gem_object_create_user helper. Apart from re-ordering of > > the placements array ENOMEM check, the only functional change here > > should be that i915_gem_dumb_create now calls i915_gem_flush_free_objects > > which it probably should have been calling all along. > > > > Signed-off-by: Jason Ekstrand > > --- > > drivers/gpu/drm/i915/gem/i915_gem_create.c | 106 + > > 1 file changed, 43 insertions(+), 63 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > index 391c8c4a12172..69bf9ec777642 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > @@ -11,13 +11,14 @@ > > #include "i915_trace.h" > > #include "i915_user_extensions.h" > > > > -static u32 object_max_page_size(struct drm_i915_gem_object *obj) > > +static u32 object_max_page_size(struct intel_memory_region **placements, > > + unsigned int n_placements) > > { > > u32 max_page_size = 0; > > int i; > > > > - for (i = 0; i < obj->mm.n_placements; i++) { > > - struct intel_memory_region *mr = obj->mm.placements[i]; > > + for (i = 0; i < n_placements; i++) { > > + struct intel_memory_region *mr = placements[i]; > > > > GEM_BUG_ON(!is_power_of_2(mr->min_page_size)); > > max_page_size = max_t(u32, max_page_size, > > mr->min_page_size); > > @@ -81,22 +82,35 @@ static int i915_gem_publish(struct drm_i915_gem_object > > *obj, > > return 0; > > } > > > > -static int > > -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) > > +static struct drm_i915_gem_object * > > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, > > + struct intel_memory_region **placements, > > + unsigned int n_placements) > > { > > - struct intel_memory_region *mr = obj->mm.placements[0]; > > + struct intel_memory_region *mr = placements[0]; > > + struct drm_i915_gem_object *obj; > > unsigned int flags; > > int ret; > > > > - size = round_up(size, object_max_page_size(obj)); > > + i915_gem_flush_free_objects(i915); > > + > > + obj = i915_gem_object_alloc(); > > + if (!obj) > > + return ERR_PTR(-ENOMEM); > > + > > + size = round_up(size, object_max_page_size(placements, > > n_placements)); > > if (size == 0) > > - return -EINVAL; > > + return ERR_PTR(-EINVAL); > > > > /* For most of the ABI (e.g. mmap) we think in system pages */ > > GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); > > > > if (i915_gem_object_size_2big(size)) > > - return -E2BIG; > > + return ERR_PTR(-E2BIG); > > + > > + ret = object_set_placements(obj, placements, n_placements); > > + if (ret) > > + goto object_free; > > Thinking on this again, it might be way too thorny to expose > create_user as-is to other parts of i915, like we do in the last > patch. Since the caller will be expected to manually validate the > placements, otherwise we might crash and burn in weird ways as new > users pop up. i.e it needs the same validation that happens as part of > the extension. Also as new extensions arrive, like with PXP, that also > has to get bolted onto create_user, which might have its own hidden > constraints. Perhaps. Do you have a suggestion for how to make it available to selftests without exposing it to "the rest of i915"? If you want, I can make create_user duplicate the placements uniqueness check. That's really the only validation currently in the ioctl besides all the stuff for making sure that the class/instance provided by the user isn't bogus. But if we've got real i915_memory_region pointers, we don't need that. --Jason
Re: [PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)
On Mon, Jul 19, 2021 at 3:18 AM Matthew Auld wrote: > > On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand wrote: > > > > Since we don't allow changing the set of regions after creation, we can > > make ext_set_placements() build up the region set directly in the > > create_ext and assign it to the object later. This is similar to what > > we did for contexts with the proto-context only simpler because there's > > no funny object shuffling. This will be used in the next patch to allow > > us to de-duplicate a bunch of code. Also, since we know the maximum > > number of regions up-front, we can use a fixed-size temporary array for > > the regions. This simplifies memory management a bit for this new > > delayed approach. > > > > v2 (Matthew Auld): > > - Get rid of MAX_N_PLACEMENTS > > - Drop kfree(placements) from set_placements() > > > > Signed-off-by: Jason Ekstrand > > Cc: Matthew Auld > > --- > > drivers/gpu/drm/i915/gem/i915_gem_create.c | 81 -- > > 1 file changed, 45 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > index 51f92e4b1a69d..5766749a449c0 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct > > drm_i915_gem_object *obj) > > return max_page_size; > > } > > > > -static void object_set_placements(struct drm_i915_gem_object *obj, > > - struct intel_memory_region **placements, > > - unsigned int n_placements) > > +static int object_set_placements(struct drm_i915_gem_object *obj, > > +struct intel_memory_region **placements, > > +unsigned int n_placements) > > { > > + struct intel_memory_region **arr; > > + unsigned int i; > > + > > GEM_BUG_ON(!n_placements); > > > > /* > > @@ -44,9 +47,20 @@ static void object_set_placements(struct > > drm_i915_gem_object *obj, > > obj->mm.placements = &i915->mm.regions[mr->id]; > > obj->mm.n_placements = 1; > > } else { > > - obj->mm.placements = placements; > > + arr = kmalloc_array(n_placements, > > + sizeof(struct intel_memory_region *), > > + GFP_KERNEL); > > + if (!arr) > > + return -ENOMEM; > > + > > + for (i = 0; i < n_placements; i++) > > + arr[i] = placements[i]; > > + > > + obj->mm.placements = arr; > > obj->mm.n_placements = n_placements; > > } > > + > > + return 0; > > } > > > > static int i915_gem_publish(struct drm_i915_gem_object *obj, > > @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file, > > return -ENOMEM; > > > > mr = intel_memory_region_by_type(to_i915(dev), mem_type); > > - object_set_placements(obj, &mr, 1); > > + ret = object_set_placements(obj, &mr, 1); > > + if (ret) > > + goto object_free; > > > > ret = i915_gem_setup(obj, args->size); > > if (ret) > > @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void > > *data, > > return -ENOMEM; > > > > mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM); > > - object_set_placements(obj, &mr, 1); > > + ret = object_set_placements(obj, &mr, 1); > > + if (ret) > > + goto object_free; > > > > ret = i915_gem_setup(obj, args->size); > > if (ret) > > @@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void > > *data, > > > > struct create_ext { > > struct drm_i915_private *i915; > > - struct drm_i915_gem_object *vanilla_object; > > + struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; > > + unsigned int n_placements; > > }; > > > > static void repr_placements(char *buf, size_t size, > > @@ -230,8 +249,7 @@ static int set_placements(struct > > drm_i915_gem_create_ext_memory_regions *args, > > struct drm_i915_private *i91
Re: [Intel-gfx] [PATCH 3/6] drm/i915: Use a table for i915_init/exit
On Wed, Jul 21, 2021 at 4:06 AM Tvrtko Ursulin wrote: > > > On 20/07/2021 19:13, Jason Ekstrand wrote: > > If the driver was not fully loaded, we may still have globals lying > > around. If we don't tear those down in i915_exit(), we'll leak a bunch > > of memory slabs. This can happen two ways: use_kms = false and if we've > > run mock selftests. In either case, we have an early exit from > > i915_init which happens after i915_globals_init() and we need to clean > > up those globals. > > > > The mock selftests case is especially sticky. The load isn't entirely > > a no-op. We actually do quite a bit inside those selftests including > > allocating a bunch of mock objects and running tests on them. Once all > > those tests are complete, we exit early from i915_init(). Perviously, > > i915_init() would return a non-zero error code on failure and a zero > > error code on success. In the success case, we would get to i915_exit() > > and check i915_pci_driver.driver.owner to detect if i915_init exited early > > and do nothing. In the failure case, we would fail i915_init() but > > there would be no opportunity to clean up globals. > > > > The most annoying part is that you don't actually notice the failure as > > part of the self-tests since leaking a bit of memory, while bad, doesn't > > result in anything observable from userspace. Instead, the next time we > > load the driver (usually for next IGT test), i915_globals_init() gets > > invoked again, we go to allocate a bunch of new memory slabs, those > > implicitly create debugfs entries, and debugfs warns that we're trying > > to create directories and files that already exist. Since this all > > happens as part of the next driver load, it shows up in the dmesg-warn > > of whatever IGT test ran after the mock selftests. > > > > While the obvious thing to do here might be to call i915_globals_exit() > > after selftests, that's not actually safe. The dma-buf selftests call > > i915_gem_prime_export which creates a file. We call dma_buf_put() on > > the resulting dmabuf which calls fput() on the file. However, fput() > > isn't immediate and gets flushed right before syscall returns. This > > means that all the fput()s from the selftests don't happen until right > > before the module load syscall used to fire off the selftests returns > > which is after i915_init(). If we call i915_globals_exit() in > > i915_init() after selftests, we end up freeing slabs out from under > > objects which won't get released until fput() is flushed at the end of > > the module load syscall. > > > > The solution here is to let i915_init() return success early and detect > > the early success in i915_exit() and only tear down globals and nothing > > else. This way the module loads successfully, regardless of the success > > or failure of the tests. Because we've not enumerated any PCI devices, > > no device nodes are created and it's entirely useless from userspace. > > The only thing the module does at that point is hold on to a bit of > > memory until we unload it and i915_exit() is called. Importantly, this > > means that everything from our selftests has the ability to properly > > flush out between i915_init() and i915_exit() because there is at least > > one syscall boundary in between. > > > > In order to handle all the delicate init/exit cases, we convert the > > whole thing to a table of init/exit pairs and track the init status in > > the new init_progress global. This allows us to ensure that i915_exit() > > always tears down exactly the things that i915_init() successfully > > initialized. We also allow early-exit of i915_init() without failure by > > an init function returning > 0. This is useful for nomodeset, and > > selftests. For the mock selftests, we convert them to always return 1 > > so we get the desired behavior of the driver always succeeding to load > > the driver and then properly tearing down the partially loaded driver. > > > > Signed-off-by: Jason Ekstrand > > Cc: Daniel Vetter > > Cc: Tvrtko Ursulin > > --- > > drivers/gpu/drm/i915/i915_pci.c | 104 -- > > drivers/gpu/drm/i915/i915_perf.c | 3 +- > > drivers/gpu/drm/i915/i915_perf.h | 2 +- > > drivers/gpu/drm/i915/i915_pmu.c | 4 +- > > drivers/gpu/drm/i915/i915_pmu.h | 4 +- > > .../gpu/drm/i915/selftests/i915_selftest.c| 2 +- > > 6 files changed, 80 insertion
Re: [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()
On Wed, Jul 21, 2021 at 6:26 AM Daniel Vetter wrote: > > On Tue, Jul 20, 2021 at 09:55:22AM -0500, Jason Ekstrand wrote: > > On Tue, Jul 20, 2021 at 9:18 AM Daniel Vetter wrote: > > > > > > On Mon, Jul 19, 2021 at 01:30:44PM -0500, Jason Ekstrand wrote: > > > > If the driver was not fully loaded, we may still have globals lying > > > > around. If we don't tear those down in i915_exit(), we'll leak a bunch > > > > of memory slabs. This can happen two ways: use_kms = false and if we've > > > > run mock selftests. In either case, we have an early exit from > > > > i915_init which happens after i915_globals_init() and we need to clean > > > > up those globals. While we're here, add an explicit boolean instead of > > > > using a random field from i915_pci_device to detect partial loads. > > > > > > > > The mock selftests case gets especially sticky. The load isn't entirely > > > > a no-op. We actually do quite a bit inside those selftests including > > > > allocating a bunch of mock objects and running tests on them. Once all > > > > those tests are complete, we exit early from i915_init(). Perviously, > > > > i915_init() would return a non-zero error code on failure and a zero > > > > error code on success. In the success case, we would get to i915_exit() > > > > and check i915_pci_driver.driver.owner to detect if i915_init exited > > > > early > > > > and do nothing. In the failure case, we would fail i915_init() but > > > > there would be no opportunity to clean up globals. > > > > > > > > The most annoying part is that you don't actually notice the failure as > > > > part of the self-tests since leaking a bit of memory, while bad, doesn't > > > > result in anything observable from userspace. Instead, the next time we > > > > load the driver (usually for next IGT test), i915_globals_init() gets > > > > invoked again, we go to allocate a bunch of new memory slabs, those > > > > implicitly create debugfs entries, and debugfs warns that we're trying > > > > to create directories and files that already exist. Since this all > > > > happens as part of the next driver load, it shows up in the dmesg-warn > > > > of whatever IGT test ran after the mock selftests. > > > > > > > > While the obvious thing to do here might be to call i915_globals_exit() > > > > after selftests, that's not actually safe. The dma-buf selftests call > > > > i915_gem_prime_export which creates a file. We call dma_buf_put() on > > > > the resulting dmabuf which calls fput() on the file. However, fput() > > > > isn't immediate and gets flushed right before syscall returns. This > > > > means that all the fput()s from the selftests don't happen until right > > > > before the module load syscall used to fire off the selftests returns > > > > which is after i915_init(). If we call i915_globals_exit() in > > > > i915_init() after selftests, we end up freeing slabs out from under > > > > objects which won't get released until fput() is flushed at the end of > > > > the module load. > > > > > > > > The solution here is to let i915_init() return success early and detect > > > > the early success in i915_exit() and only tear down globals and nothing > > > > else. This way the module loads successfully, regardless of the success > > > > or failure of the tests. Because we've not enumerated any PCI devices, > > > > no device nodes are created and it's entirely useless from userspace. > > > > The only thing the module does at that point is hold on to a bit of > > > > memory until we unload it and i915_exit() is called. Importantly, this > > > > means that everything from our selftests has the ability to properly > > > > flush out between i915_init() and i915_exit() because there are a couple > > > > syscall boundaries in between. > > > > > > > > Signed-off-by: Jason Ekstrand > > > > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") > > > > Cc: Daniel Vetter > > > > --- > > > > drivers/gpu/drm/i915/i915_pci.c | 32 +--- > > > > 1 file changed, 25 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_pci.c > > > > b/dr
[PATCH 0/6] Fix the debugfs splat from mock selftests (v3)
This patch series fixes a miscellaneous collection of bugs that all add up to all our mock selftests throwing dmesg warnings in CI. As can be seen from "drm/i915: Use a table for i915_init/exit", it's especially fun since those warnings don't always show up in the selftests but can show up in other random IGTs depending on test execution order. Jason Ekstrand (6): drm/i915: Call i915_globals_exit() after i915_pmu_exit() drm/i915: Call i915_globals_exit() if pci_register_device() fails drm/i915: Use a table for i915_init/exit (v2) drm/ttm: Force re-init if ttm_global_init() fails drm/ttm: Initialize debugfs from ttm_global_init() drm/i915: Make the kmem slab for i915_buddy_block a global drivers/gpu/drm/i915/i915_buddy.c | 44 ++-- drivers/gpu/drm/i915/i915_buddy.h | 3 +- drivers/gpu/drm/i915/i915_globals.c | 6 +- drivers/gpu/drm/i915/i915_pci.c | 104 -- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_perf.h | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 4 +- drivers/gpu/drm/i915/i915_pmu.h | 4 +- .../gpu/drm/i915/selftests/i915_selftest.c| 4 +- drivers/gpu/drm/ttm/ttm_device.c | 14 +++ drivers/gpu/drm/ttm/ttm_module.c | 16 --- 11 files changed, 136 insertions(+), 68 deletions(-) -- 2.31.1
[PATCH 2/6] drm/i915: Call i915_globals_exit() if pci_register_device() fails
In the unlikely event that pci_register_device() fails, we were tearing down our PMU setup but not globals. This leaves a bunch of memory slabs lying around. Signed-off-by: Jason Ekstrand Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_globals.c | 4 ++-- drivers/gpu/drm/i915/i915_pci.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 77f1911c463b8..87267e1d2ad92 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -138,7 +138,7 @@ void i915_globals_unpark(void) atomic_inc(&active); } -static void __exit __i915_globals_flush(void) +static void __i915_globals_flush(void) { atomic_inc(&active); /* skip shrinking */ @@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void) atomic_dec(&active); } -void __exit i915_globals_exit(void) +void i915_globals_exit(void) { GEM_BUG_ON(atomic_read(&active)); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 50ed93b03e582..4e627b57d31a2 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1230,6 +1230,7 @@ static int __init i915_init(void) err = pci_register_driver(&i915_pci_driver); if (err) { i915_pmu_exit(); + i915_globals_exit(); return err; } -- 2.31.1
[PATCH 1/6] drm/i915: Call i915_globals_exit() after i915_pmu_exit()
We should tear down in the opposite order we set up. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 67696d7522718..50ed93b03e582 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1244,8 +1244,8 @@ static void __exit i915_exit(void) i915_perf_sysctl_unregister(); pci_unregister_driver(&i915_pci_driver); - i915_globals_exit(); i915_pmu_exit(); + i915_globals_exit(); } module_init(i915_init); -- 2.31.1
[PATCH 3/6] drm/i915: Use a table for i915_init/exit (v2)
If the driver was not fully loaded, we may still have globals lying around. If we don't tear those down in i915_exit(), we'll leak a bunch of memory slabs. This can happen two ways: use_kms = false and if we've run mock selftests. In either case, we have an early exit from i915_init which happens after i915_globals_init() and we need to clean up those globals. The mock selftests case is especially sticky. The load isn't entirely a no-op. We actually do quite a bit inside those selftests including allocating a bunch of mock objects and running tests on them. Once all those tests are complete, we exit early from i915_init(). Perviously, i915_init() would return a non-zero error code on failure and a zero error code on success. In the success case, we would get to i915_exit() and check i915_pci_driver.driver.owner to detect if i915_init exited early and do nothing. In the failure case, we would fail i915_init() but there would be no opportunity to clean up globals. The most annoying part is that you don't actually notice the failure as part of the self-tests since leaking a bit of memory, while bad, doesn't result in anything observable from userspace. Instead, the next time we load the driver (usually for next IGT test), i915_globals_init() gets invoked again, we go to allocate a bunch of new memory slabs, those implicitly create debugfs entries, and debugfs warns that we're trying to create directories and files that already exist. Since this all happens as part of the next driver load, it shows up in the dmesg-warn of whatever IGT test ran after the mock selftests. While the obvious thing to do here might be to call i915_globals_exit() after selftests, that's not actually safe. The dma-buf selftests call i915_gem_prime_export which creates a file. We call dma_buf_put() on the resulting dmabuf which calls fput() on the file. However, fput() isn't immediate and gets flushed right before syscall returns. This means that all the fput()s from the selftests don't happen until right before the module load syscall used to fire off the selftests returns which is after i915_init(). If we call i915_globals_exit() in i915_init() after selftests, we end up freeing slabs out from under objects which won't get released until fput() is flushed at the end of the module load syscall. The solution here is to let i915_init() return success early and detect the early success in i915_exit() and only tear down globals and nothing else. This way the module loads successfully, regardless of the success or failure of the tests. Because we've not enumerated any PCI devices, no device nodes are created and it's entirely useless from userspace. The only thing the module does at that point is hold on to a bit of memory until we unload it and i915_exit() is called. Importantly, this means that everything from our selftests has the ability to properly flush out between i915_init() and i915_exit() because there is at least one syscall boundary in between. In order to handle all the delicate init/exit cases, we convert the whole thing to a table of init/exit pairs and track the init status in the new init_progress global. This allows us to ensure that i915_exit() always tears down exactly the things that i915_init() successfully initialized. We also allow early-exit of i915_init() without failure by an init function returning > 0. This is useful for nomodeset, and selftests. For the mock selftests, we convert them to always return 1 so we get the desired behavior of the driver always succeeding to load the driver and then properly tearing down the partially loaded driver. v2 (Tvrtko Ursulin): - Guard init_funcs[i].exit with GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs)) v2 (Daniel Vetter): - Update the docstring for i915.mock_selftests Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pci.c | 105 -- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_perf.h | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 4 +- drivers/gpu/drm/i915/i915_pmu.h | 4 +- .../gpu/drm/i915/selftests/i915_selftest.c| 4 +- 6 files changed, 82 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4e627b57d31a2..5f05cb1b5ac6b 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev) i915_driver_shutdown(i915); } -static struct pci_driver i915_pci_driver = { - .name = DRIVER_NAME, - .id_table = pciidlist, - .probe = i915_pci_probe, - .remove = i915_pci_remove, - .shutdown = i915_pci_shutdown, - .driver.pm = &i915_pm_ops, -}; - -static int __init i915_init(void) +static int i915_check_nom
[PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global
There's no reason that I can tell why this should be per-i915_buddy_mm and doing so causes KMEM_CACHE to throw dmesg warnings because it tries to create a debugfs entry with the name i915_buddy_block multiple times. We could handle this by carefully giving each slab its own name but that brings its own pain because then we have to store that string somewhere and manage the lifetimes of the different slabs. The most likely outcome would be a global atomic which we increment to get a new name or something like that. The much easier solution is to use the i915_globals system like we do for every other slab in i915. This ensures that we have exactly one of them for each i915 driver load and it gets neatly created on module load and destroyed on module unload. Using the globals system also means that its now tied into the shrink handler so we can properly respond to low-memory situations. Signed-off-by: Jason Ekstrand Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man") Cc: Matthew Auld Cc: Christian König --- drivers/gpu/drm/i915/i915_buddy.c | 44 ++--- drivers/gpu/drm/i915/i915_buddy.h | 3 +- drivers/gpu/drm/i915/i915_globals.c | 2 ++ 3 files changed, 38 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_buddy.c b/drivers/gpu/drm/i915/i915_buddy.c index 29dd7d0310c1f..911feedad4513 100644 --- a/drivers/gpu/drm/i915/i915_buddy.c +++ b/drivers/gpu/drm/i915/i915_buddy.c @@ -8,8 +8,14 @@ #include "i915_buddy.h" #include "i915_gem.h" +#include "i915_globals.h" #include "i915_utils.h" +static struct i915_global_buddy { + struct i915_global base; + struct kmem_cache *slab_blocks; +} global; + static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, struct i915_buddy_block *parent, unsigned int order, @@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER); - block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL); + block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL); if (!block) return NULL; @@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, static void i915_block_free(struct i915_buddy_mm *mm, struct i915_buddy_block *block) { - kmem_cache_free(mm->slab_blocks, block); + kmem_cache_free(global.slab_blocks, block); } static void mark_allocated(struct i915_buddy_block *block) @@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 chunk_size) GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER); - mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN); - if (!mm->slab_blocks) - return -ENOMEM; - mm->free_list = kmalloc_array(mm->max_order + 1, sizeof(struct list_head), GFP_KERNEL); if (!mm->free_list) - goto out_destroy_slab; + return -ENOMEM; for (i = 0; i <= mm->max_order; ++i) INIT_LIST_HEAD(&mm->free_list[i]); @@ -145,8 +147,6 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 size, u64 chunk_size) kfree(mm->roots); out_free_list: kfree(mm->free_list); -out_destroy_slab: - kmem_cache_destroy(mm->slab_blocks); return -ENOMEM; } @@ -161,7 +161,6 @@ void i915_buddy_fini(struct i915_buddy_mm *mm) kfree(mm->roots); kfree(mm->free_list); - kmem_cache_destroy(mm->slab_blocks); } static int split_block(struct i915_buddy_mm *mm, @@ -410,3 +409,28 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm, #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_buddy.c" #endif + +static void i915_global_buddy_shrink(void) +{ + kmem_cache_shrink(global.slab_blocks); +} + +static void i915_global_buddy_exit(void) +{ + kmem_cache_destroy(global.slab_blocks); +} + +static struct i915_global_buddy global = { { + .shrink = i915_global_buddy_shrink, + .exit = i915_global_buddy_exit, +} }; + +int __init i915_global_buddy_init(void) +{ + global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0); + if (!global.slab_blocks) + return -ENOMEM; + + i915_global_register(&global.base); + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_buddy.h b/drivers/gpu/drm/i915/i915_buddy.h index 37f8c42071d12..d8f26706de52f 100644 --- a/drivers/gpu/drm/i915/i915_buddy.h +++ b/drivers/gpu/drm/i915/i915_buddy.h @@ -47,7 +47,6 @@ struct i915_buddy_block { * i915_buddy_alloc* and i915_buddy_free* should suffice. */ struct i915_buddy_mm { - struct k
[PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init()
We create a bunch of debugfs entries as a side-effect of ttm_global_init() and then never clean them up. This isn't usually a problem because we free the whole debugfs directory on module unload. However, if the global reference count ever goes to zero and then ttm_global_init() is called again, we'll re-create those debugfs entries and debugfs will complain in dmesg that we're creating entries that already exist. This patch fixes this problem by changing the lifetime of the whole TTM debugfs directory to match that of the TTM global state. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/ttm/ttm_device.c | 12 drivers/gpu/drm/ttm/ttm_module.c | 16 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 519deea8e39b7..74e3b460132b3 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -44,6 +44,8 @@ static unsigned ttm_glob_use_count; struct ttm_global ttm_glob; EXPORT_SYMBOL(ttm_glob); +struct dentry *ttm_debugfs_root; + static void ttm_global_release(void) { struct ttm_global *glob = &ttm_glob; @@ -53,6 +55,7 @@ static void ttm_global_release(void) goto out; ttm_pool_mgr_fini(); + debugfs_remove(ttm_debugfs_root); __free_page(glob->dummy_read_page); memset(glob, 0, sizeof(*glob)); @@ -73,6 +76,13 @@ static int ttm_global_init(void) si_meminfo(&si); + ttm_debugfs_root = debugfs_create_dir("ttm", NULL); + if (IS_ERR(ttm_debugfs_root)) { + ret = PTR_ERR(ttm_debugfs_root); + ttm_debugfs_root = NULL; + goto out; + } + /* Limit the number of pages in the pool to about 50% of the total * system memory. */ @@ -100,6 +110,8 @@ static int ttm_global_init(void) debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, &glob->bo_count); out: + if (ret && ttm_debugfs_root) + debugfs_remove(ttm_debugfs_root); if (ret) --ttm_glob_use_count; mutex_unlock(&ttm_global_mutex); diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c index 997c458f68a9a..7fcdef278c742 100644 --- a/drivers/gpu/drm/ttm/ttm_module.c +++ b/drivers/gpu/drm/ttm/ttm_module.c @@ -72,22 +72,6 @@ pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp) return tmp; } -struct dentry *ttm_debugfs_root; - -static int __init ttm_init(void) -{ - ttm_debugfs_root = debugfs_create_dir("ttm", NULL); - return 0; -} - -static void __exit ttm_exit(void) -{ - debugfs_remove(ttm_debugfs_root); -} - -module_init(ttm_init); -module_exit(ttm_exit); - MODULE_AUTHOR("Thomas Hellstrom, Jerome Glisse"); MODULE_DESCRIPTION("TTM memory manager subsystem (for DRM device)"); MODULE_LICENSE("GPL and additional rights"); -- 2.31.1
[PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails
If we have a failure, decrement the reference count so that the next call to ttm_global_init() will actually do something instead of assume everything is all set up. Signed-off-by: Jason Ekstrand Fixes: 62b53b37e4b1 ("drm/ttm: use a static ttm_bo_global instance") Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 5f31acec3ad76..519deea8e39b7 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -100,6 +100,8 @@ static int ttm_global_init(void) debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root, &glob->bo_count); out: + if (ret) + --ttm_glob_use_count; mutex_unlock(&ttm_global_mutex); return ret; } -- 2.31.1
Re: [PATCH 3/7] drm/i915/gem: Unify user object creation
On Wed, Jul 21, 2021 at 3:25 AM Matthew Auld wrote: > > On Tue, 20 Jul 2021 at 23:04, Jason Ekstrand wrote: > > > > On Tue, Jul 20, 2021 at 4:35 AM Matthew Auld > > wrote: > > > > > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand wrote: > > > > > > > > Instead of hand-rolling the same three calls in each function, pull them > > > > into an i915_gem_object_create_user helper. Apart from re-ordering of > > > > the placements array ENOMEM check, the only functional change here > > > > should be that i915_gem_dumb_create now calls > > > > i915_gem_flush_free_objects > > > > which it probably should have been calling all along. > > > > > > > > Signed-off-by: Jason Ekstrand > > > > --- > > > > drivers/gpu/drm/i915/gem/i915_gem_create.c | 106 + > > > > 1 file changed, 43 insertions(+), 63 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > index 391c8c4a12172..69bf9ec777642 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > @@ -11,13 +11,14 @@ > > > > #include "i915_trace.h" > > > > #include "i915_user_extensions.h" > > > > > > > > -static u32 object_max_page_size(struct drm_i915_gem_object *obj) > > > > +static u32 object_max_page_size(struct intel_memory_region > > > > **placements, > > > > + unsigned int n_placements) > > > > { > > > > u32 max_page_size = 0; > > > > int i; > > > > > > > > - for (i = 0; i < obj->mm.n_placements; i++) { > > > > - struct intel_memory_region *mr = obj->mm.placements[i]; > > > > + for (i = 0; i < n_placements; i++) { > > > > + struct intel_memory_region *mr = placements[i]; > > > > > > > > GEM_BUG_ON(!is_power_of_2(mr->min_page_size)); > > > > max_page_size = max_t(u32, max_page_size, > > > > mr->min_page_size); > > > > @@ -81,22 +82,35 @@ static int i915_gem_publish(struct > > > > drm_i915_gem_object *obj, > > > > return 0; > > > > } > > > > > > > > -static int > > > > -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) > > > > +static struct drm_i915_gem_object * > > > > +i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, > > > > + struct intel_memory_region **placements, > > > > + unsigned int n_placements) > > > > { > > > > - struct intel_memory_region *mr = obj->mm.placements[0]; > > > > + struct intel_memory_region *mr = placements[0]; > > > > + struct drm_i915_gem_object *obj; > > > > unsigned int flags; > > > > int ret; > > > > > > > > - size = round_up(size, object_max_page_size(obj)); > > > > + i915_gem_flush_free_objects(i915); > > > > + > > > > + obj = i915_gem_object_alloc(); > > > > + if (!obj) > > > > + return ERR_PTR(-ENOMEM); > > > > + > > > > + size = round_up(size, object_max_page_size(placements, > > > > n_placements)); > > > > if (size == 0) > > > > - return -EINVAL; > > > > + return ERR_PTR(-EINVAL); > > > > > > > > /* For most of the ABI (e.g. mmap) we think in system pages */ > > > > GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); > > > > > > > > if (i915_gem_object_size_2big(size)) > > > > - return -E2BIG; > > > > + return ERR_PTR(-E2BIG); > > > > + > > > > + ret = object_set_placements(obj, placements, n_placements); > > > > + if (ret) > > > > + goto object_free; > > > > > > Thinking on this again, it might be way too thorny to expose > > > create_user as-is to other parts of i915, like we do in the last > > > patch. Since the caller will be expected to manually validate the > > > placements, otherwise we migh
Re: [PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)
On Wed, Jul 21, 2021 at 3:32 AM Matthew Auld wrote: > > On Tue, 20 Jul 2021 at 23:07, Jason Ekstrand wrote: > > > > On Mon, Jul 19, 2021 at 3:18 AM Matthew Auld > > wrote: > > > > > > On Fri, 16 Jul 2021 at 15:14, Jason Ekstrand wrote: > > > > > > > > Since we don't allow changing the set of regions after creation, we can > > > > make ext_set_placements() build up the region set directly in the > > > > create_ext and assign it to the object later. This is similar to what > > > > we did for contexts with the proto-context only simpler because there's > > > > no funny object shuffling. This will be used in the next patch to allow > > > > us to de-duplicate a bunch of code. Also, since we know the maximum > > > > number of regions up-front, we can use a fixed-size temporary array for > > > > the regions. This simplifies memory management a bit for this new > > > > delayed approach. > > > > > > > > v2 (Matthew Auld): > > > > - Get rid of MAX_N_PLACEMENTS > > > > - Drop kfree(placements) from set_placements() > > > > > > > > Signed-off-by: Jason Ekstrand > > > > Cc: Matthew Auld > > > > --- > > > > drivers/gpu/drm/i915/gem/i915_gem_create.c | 81 -- > > > > 1 file changed, 45 insertions(+), 36 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > index 51f92e4b1a69d..5766749a449c0 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > > > > @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct > > > > drm_i915_gem_object *obj) > > > > return max_page_size; > > > > } > > > > > > > > -static void object_set_placements(struct drm_i915_gem_object *obj, > > > > - struct intel_memory_region > > > > **placements, > > > > - unsigned int n_placements) > > > > +static int object_set_placements(struct drm_i915_gem_object *obj, > > > > +struct intel_memory_region > > > > **placements, > > > > +unsigned int n_placements) > > > > { > > > > + struct intel_memory_region **arr; > > > > + unsigned int i; > > > > + > > > > GEM_BUG_ON(!n_placements); > > > > > > > > /* > > > > @@ -44,9 +47,20 @@ static void object_set_placements(struct > > > > drm_i915_gem_object *obj, > > > > obj->mm.placements = &i915->mm.regions[mr->id]; > > > > obj->mm.n_placements = 1; > > > > } else { > > > > - obj->mm.placements = placements; > > > > + arr = kmalloc_array(n_placements, > > > > + sizeof(struct intel_memory_region > > > > *), > > > > + GFP_KERNEL); > > > > + if (!arr) > > > > + return -ENOMEM; > > > > + > > > > + for (i = 0; i < n_placements; i++) > > > > + arr[i] = placements[i]; > > > > + > > > > + obj->mm.placements = arr; > > > > obj->mm.n_placements = n_placements; > > > > } > > > > + > > > > + return 0; > > > > } > > > > > > > > static int i915_gem_publish(struct drm_i915_gem_object *obj, > > > > @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file, > > > > return -ENOMEM; > > > > > > > > mr = intel_memory_region_by_type(to_i915(dev), mem_type); > > > > - object_set_placements(obj, &mr, 1); > > > > + ret = object_set_placements(obj, &mr, 1); > > > > + if (ret) > > > > + goto object_free; > > > > > > > > ret = i915_gem_setup(obj, args->size); > > > > if (ret) > > > > @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void > > > > *data
Re: [PATCH 5/7] drm/i915/gem/ttm: Respect the objection region in placement_from_obj
On Mon, Jul 19, 2021 at 8:35 AM Matthew Auld wrote: > > On Fri, 16 Jul 2021 at 20:49, Jason Ekstrand wrote: > > > > On Fri, Jul 16, 2021 at 1:45 PM Matthew Auld > > wrote: > > > > > > On Fri, 16 Jul 2021 at 18:39, Jason Ekstrand wrote: > > > > > > > > On Fri, Jul 16, 2021 at 11:00 AM Matthew Auld > > > > wrote: > > > > > > > > > > On Fri, 16 Jul 2021 at 16:52, Matthew Auld > > > > > wrote: > > > > > > > > > > > > On Fri, 16 Jul 2021 at 15:10, Jason Ekstrand > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Jul 16, 2021 at 8:54 AM Matthew Auld > > > > > > > wrote: > > > > > > > > > > > > > > > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Whenever we had a user object (n_placements > 0), we were > > > > > > > > > ignoring > > > > > > > > > obj->mm.region and always putting obj->placements[0] as the > > > > > > > > > requested > > > > > > > > > region. For LMEM+SMEM objects, this was causing them to get > > > > > > > > > shoved into > > > > > > > > > LMEM on every i915_ttm_get_pages() even when SMEM was > > > > > > > > > requested by, say, > > > > > > > > > i915_gem_object_migrate(). > > > > > > > > > > > > > > > > i915_ttm_migrate calls i915_ttm_place_from_region() directly > > > > > > > > with the > > > > > > > > requested region, so there shouldn't be an issue with migration > > > > > > > > right? > > > > > > > > Do you have some more details? > > > > > > > > > > > > > > With i915_ttm_migrate directly, no. But, in the last patch in the > > > > > > > series, we're trying to migrate LMEM+SMEM buffers into SMEM on > > > > > > > attach() and pin it there. This blows up in a very unexpected > > > > > > > (IMO) > > > > > > > way. The flow goes something like this: > > > > > > > > > > > > > > - Client attempts a dma-buf import from another device > > > > > > > - In attach() we call i915_gem_object_migrate() which calls > > > > > > > i915_ttm_migrate() which migrates as requested. > > > > > > > - Once the migration is complete, we call > > > > > > > i915_gem_object_pin_pages() > > > > > > > which calls i915_ttm_get_pages() which depends on > > > > > > > i915_ttm_placement_from_obj() and so migrates it right back to > > > > > > > LMEM. > > > > > > > > > > > > The mm.pages must be NULL here, otherwise it would just increment > > > > > > the > > > > > > pages_pin_count? > > > > > > > > Given that the test is using the four_underscores version, it > > > > doesn't have that check. However, this executes after we've done the > > > > dma-buf import which pinned pages. So we should definitely have > > > > pages. > > > > > > We shouldn't call four_underscores() if we might already have > > > pages though. Under non-TTM that would leak the pages, and in TTM we > > > might hit the WARN_ON(mm->pages) in __i915_ttm_get_pages(), if for > > > example nothing was moved. I take it we can't just call pin_pages()? > > > Four scary underscores usually means "don't call this in normal code". > > > > I've switched the four_underscores call to a __two_underscores in > > the selftests and it had no effect, good or bad. But, still, probably > > better to call that one. > > > > > > > > > > > > > > > > > > > > Maybe the problem here is actually that our TTM code isn't > > > > > > > respecting > > > > > > > obj->mm.pages_pin_count? > > > > > > > > > > > > I think if the resource is moved, we always nuke the mm.pages after > > > > > > being notified of the move. Also TTM is also not
[PATCH 0/7] drm/i915: Migrate memory to SMEM when imported cross-device (v8)
This patch series fixes an issue with discrete graphics on Intel where we allowed dma-buf import while leaving the object in local memory. This breaks down pretty badly if the import happened on a different physical device. v7: - Drop "drm/i915/gem/ttm: Place new BOs in the requested region" - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()" - Misc. review feedback from Matthew Auld v8: - Misc. review feedback from Matthew Auld Jason Ekstrand (5): drm/i915/gem: Check object_can_migrate from object_migrate drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2) drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create() drm/i915/gem: Unify user object creation (v3) drm/i915/gem/ttm: Respect the objection region in placement_from_obj Thomas Hellström (2): drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8) drm/i915/gem: Migrate to system at dma-buf attach time (v7) drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 58 -- drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 +- drivers/gpu/drm/i915/gem/i915_gem_object.h| 4 + drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 190 +- .../drm/i915/gem/selftests/i915_gem_migrate.c | 15 -- 7 files changed, 330 insertions(+), 130 deletions(-) -- 2.31.1
[PATCH 1/7] drm/i915/gem: Check object_can_migrate from object_migrate
We don't roll them together entirely because there are still a couple cases where we want a separate can_migrate check. For instance, the display code checks that you can migrate a buffer to LMEM before it accepts it in fb_create. The dma-buf import code also uses it to do an early check and return a different error code if someone tries to attach a LMEM-only dma-buf to another driver. However, no one actually wants to call object_migrate when can_migrate has failed. The stated intention is for self-tests but none of those actually take advantage of this unsafe migration. Signed-off-by: Jason Ekstrand Cc: Daniel Vetter Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 ++--- .../gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 15 --- 2 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 9da7b288b7ede..f2244ae09a613 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -584,12 +584,6 @@ bool i915_gem_object_can_migrate(struct drm_i915_gem_object *obj, * completed yet, and to accomplish that, i915_gem_object_wait_migration() * must be called. * - * This function is a bit more permissive than i915_gem_object_can_migrate() - * to allow for migrating objects where the caller knows exactly what is - * happening. For example within selftests. More specifically this - * function allows migrating I915_BO_ALLOC_USER objects to regions - * that are not in the list of allowable regions. - * * Note: the @ww parameter is not used yet, but included to make sure * callers put some effort into obtaining a valid ww ctx if one is * available. @@ -616,11 +610,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, if (obj->mm.region == mr) return 0; - if (!i915_gem_object_evictable(obj)) - return -EBUSY; - - if (!obj->ops->migrate) - return -EOPNOTSUPP; + if (!i915_gem_object_can_migrate(obj, id)) + return -EINVAL; return obj->ops->migrate(obj, mr); } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c index 0b7144d2991ca..28a700f08b49a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c @@ -61,11 +61,6 @@ static int igt_create_migrate(struct intel_gt *gt, enum intel_region_id src, if (err) continue; - if (!i915_gem_object_can_migrate(obj, dst)) { - err = -EINVAL; - continue; - } - err = i915_gem_object_migrate(obj, &ww, dst); if (err) continue; @@ -114,11 +109,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww, return err; if (i915_gem_object_is_lmem(obj)) { - if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) { - pr_err("object can't migrate to smem.\n"); - return -EINVAL; - } - err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM); if (err) { pr_err("Object failed migration to smem\n"); @@ -137,11 +127,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww, } } else { - if (!i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) { - pr_err("object can't migrate to lmem.\n"); - return -EINVAL; - } - err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM); if (err) { pr_err("Object failed migration to lmem\n"); -- 2.31.1
[PATCH 2/7] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)
Since we don't allow changing the set of regions after creation, we can make ext_set_placements() build up the region set directly in the create_ext and assign it to the object later. This is similar to what we did for contexts with the proto-context only simpler because there's no funny object shuffling. This will be used in the next patch to allow us to de-duplicate a bunch of code. Also, since we know the maximum number of regions up-front, we can use a fixed-size temporary array for the regions. This simplifies memory management a bit for this new delayed approach. v2 (Matthew Auld): - Get rid of MAX_N_PLACEMENTS - Drop kfree(placements) from set_placements() v3 (Matthew Auld): - Properly set ext_data->n_placements Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 82 -- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 51f92e4b1a69d..aa687b10dcd45 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object *obj) return max_page_size; } -static void object_set_placements(struct drm_i915_gem_object *obj, - struct intel_memory_region **placements, - unsigned int n_placements) +static int object_set_placements(struct drm_i915_gem_object *obj, +struct intel_memory_region **placements, +unsigned int n_placements) { + struct intel_memory_region **arr; + unsigned int i; + GEM_BUG_ON(!n_placements); /* @@ -44,9 +47,20 @@ static void object_set_placements(struct drm_i915_gem_object *obj, obj->mm.placements = &i915->mm.regions[mr->id]; obj->mm.n_placements = 1; } else { - obj->mm.placements = placements; + arr = kmalloc_array(n_placements, + sizeof(struct intel_memory_region *), + GFP_KERNEL); + if (!arr) + return -ENOMEM; + + for (i = 0; i < n_placements; i++) + arr[i] = placements[i]; + + obj->mm.placements = arr; obj->mm.n_placements = n_placements; } + + return 0; } static int i915_gem_publish(struct drm_i915_gem_object *obj, @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file, return -ENOMEM; mr = intel_memory_region_by_type(to_i915(dev), mem_type); - object_set_placements(obj, &mr, 1); + ret = object_set_placements(obj, &mr, 1); + if (ret) + goto object_free; ret = i915_gem_setup(obj, args->size); if (ret) @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, return -ENOMEM; mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM); - object_set_placements(obj, &mr, 1); + ret = object_set_placements(obj, &mr, 1); + if (ret) + goto object_free; ret = i915_gem_setup(obj, args->size); if (ret) @@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, struct create_ext { struct drm_i915_private *i915; - struct drm_i915_gem_object *vanilla_object; + struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; + unsigned int n_placements; }; static void repr_placements(char *buf, size_t size, @@ -230,8 +249,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, struct drm_i915_private *i915 = ext_data->i915; struct drm_i915_gem_memory_class_instance __user *uregions = u64_to_user_ptr(args->regions); - struct drm_i915_gem_object *obj = ext_data->vanilla_object; - struct intel_memory_region **placements; + struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; u32 mask; int i, ret = 0; @@ -245,6 +263,8 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, ret = -EINVAL; } + BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements)); + BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != ARRAY_SIZE(placements)); if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) { drm_dbg(&i915->drm, "num_regions is too large\n"); ret = -EINVAL; @@ -253,21 +273,13 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, if (ret) return ret; - placements
[PATCH 3/7] drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()
This doesn't really fix anything serious since the chances of a client creating and destroying a mass of dumb BOs is pretty low. However, it is called by the other two create IOCTLs to garbage collect old objects. Call it here too for consistency. Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index aa687b10dcd45..adcce37c04b8d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -151,6 +151,8 @@ i915_gem_dumb_create(struct drm_file *file, if (args->pitch < args->width) return -EINVAL; + i915_gem_flush_free_objects(i915); + args->size = mul_u32_u32(args->pitch, args->height); mem_type = INTEL_MEMORY_SYSTEM; -- 2.31.1
[PATCH 4/7] drm/i915/gem: Unify user object creation (v3)
Instead of hand-rolling the same three calls in each function, pull them into an i915_gem_object_create_user helper. Apart from re-ordering of the placements array ENOMEM check, there should be no functional change. v2 (Matthew Auld): - Add the call to i915_gem_flush_free_objects() from i915_gem_dumb_create() in a separate patch - Move i915_gem_object_alloc() below the simple error checks v3 (Matthew Auld): - Add __ to i915_gem_object_create_user and kerneldoc which warns the caller that it's not validating anything. Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 119 ++--- drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 + 2 files changed, 58 insertions(+), 65 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index adcce37c04b8d..23fee13a33844 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -11,13 +11,14 @@ #include "i915_trace.h" #include "i915_user_extensions.h" -static u32 object_max_page_size(struct drm_i915_gem_object *obj) +static u32 object_max_page_size(struct intel_memory_region **placements, + unsigned int n_placements) { u32 max_page_size = 0; int i; - for (i = 0; i < obj->mm.n_placements; i++) { - struct intel_memory_region *mr = obj->mm.placements[i]; + for (i = 0; i < n_placements; i++) { + struct intel_memory_region *mr = placements[i]; GEM_BUG_ON(!is_power_of_2(mr->min_page_size)); max_page_size = max_t(u32, max_page_size, mr->min_page_size); @@ -81,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj, return 0; } -static int -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) +/** + * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT + * @i915: i915 private + * @size: size of the buffer, in bytes + * @placements: possible placement regions, in priority order + * @n_placements: number of possible placement regions + * + * This function is exposed primarily for selftests and does very little + * error checking. It is assumed that the set of placement regions has + * already been verified to be valid. + */ +struct drm_i915_gem_object * +__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, + struct intel_memory_region **placements, + unsigned int n_placements) { - struct intel_memory_region *mr = obj->mm.placements[0]; + struct intel_memory_region *mr = placements[0]; + struct drm_i915_gem_object *obj; unsigned int flags; int ret; - size = round_up(size, object_max_page_size(obj)); + i915_gem_flush_free_objects(i915); + + size = round_up(size, object_max_page_size(placements, n_placements)); if (size == 0) - return -EINVAL; + return ERR_PTR(-EINVAL); /* For most of the ABI (e.g. mmap) we think in system pages */ GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); if (i915_gem_object_size_2big(size)) - return -E2BIG; + return ERR_PTR(-E2BIG); + + obj = i915_gem_object_alloc(); + if (!obj) + return ERR_PTR(-ENOMEM); + + ret = object_set_placements(obj, placements, n_placements); + if (ret) + goto object_free; /* * I915_BO_ALLOC_USER will make sure the object is cleared before @@ -106,12 +131,18 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) ret = mr->ops->init_object(mr, obj, size, 0, flags); if (ret) - return ret; + goto object_free; GEM_BUG_ON(size != obj->base.size); trace_i915_gem_object_create(obj); - return 0; + return obj; + +object_free: + if (obj->mm.n_placements > 1) + kfree(obj->mm.placements); + i915_gem_object_free(obj); + return ERR_PTR(ret); } int @@ -124,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file, enum intel_memory_type mem_type; int cpp = DIV_ROUND_UP(args->bpp, 8); u32 format; - int ret; switch (cpp) { case 1: @@ -151,32 +181,19 @@ i915_gem_dumb_create(struct drm_file *file, if (args->pitch < args->width) return -EINVAL; - i915_gem_flush_free_objects(i915); - args->size = mul_u32_u32(args->pitch, args->height); mem_type = INTEL_MEMORY_SYSTEM; if (HAS_LMEM(to_i915(dev))) mem_type = INTEL_MEMORY_LOCAL; - obj = i915_gem_object_alloc(); - if (!obj) - return -ENOMEM; - mr = intel_memory_region_by_type(to_i915(dev), mem_type); -
[PATCH 5/7] drm/i915/gem/ttm: Respect the objection region in placement_from_obj
Whenever we had a user object (n_placements > 0), we were ignoring obj->mm.region and always putting obj->placements[0] as the requested region. For LMEM+SMEM objects, this was causing them to get shoved into LMEM on every i915_ttm_get_pages() even when SMEM was requested by, say, i915_gem_object_migrate(). Signed-off-by: Jason Ekstrand Cc: Thomas Hellström Cc: Matthew Auld Cc: Maarten Lankhorst --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index f253b11e9e367..b76bdd978a5cc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -150,8 +150,7 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, unsigned int i; placement->num_placement = 1; - i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : - obj->mm.region, requested, flags); + i915_ttm_place_from_region(obj->mm.region, requested, flags); /* Cache this on object? */ placement->num_busy_placement = num_allowed; -- 2.31.1
[PATCH 6/7] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
From: Thomas Hellström If our exported dma-bufs are imported by another instance of our driver, that instance will typically have the imported dma-bufs locked during dma_buf_map_attachment(). But the exporter also locks the same reservation object in the map_dma_buf() callback, which leads to recursive locking. So taking the lock inside _pin_pages_unlocked() is incorrect. Additionally, the current pinning code path is contrary to the defined way that pinning should occur. Remove the explicit pin/unpin from the map/umap functions and move them to the attach/detach allowing correct locking to occur, and to match the static dma-buf drm_prime pattern. Add a live selftest to exercise both dynamic and non-dynamic exports. v2: - Extend the selftest with a fake dynamic importer. - Provide real pin and unpin callbacks to not abuse the interface. v3: (ruhl) - Remove the dynamic export support and move the pinning into the attach/detach path. v4: (ruhl) - Put pages does not need to assert on the dma-resv v5: (jason) - Lock around dma_buf_unmap_attachment() when emulating a dynamic importer in the subtests. - Use pin_pages_unlocked v6: (jason) - Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests v7: (mauld) - Use __i915_gem_object_get_pages (2 __underscores) instead of the 4 underscore version in the selftests v8: (mauld) - Drop the kernel doc from the static i915_gem_dmabuf_attach function - Add missing "err = PTR_ERR()" to a bunch of selftest error cases Reported-by: Michael J. Ruhl Signed-off-by: Thomas Hellström Signed-off-by: Michael J. Ruhl Signed-off-by: Jason Ekstrand Reviewed-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 37 -- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 109 +- 2 files changed, 132 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 616c3a2f1baf0..59dc56ae14d6b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -12,6 +12,8 @@ #include "i915_gem_object.h" #include "i915_scatterlist.h" +I915_SELFTEST_DECLARE(static bool force_different_devices;) + static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf) { return to_intel_bo(buf->priv); @@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme struct scatterlist *src, *dst; int ret, i; - ret = i915_gem_object_pin_pages_unlocked(obj); - if (ret) - goto err; - /* Copy sg so that we make an independent mapping */ st = kmalloc(sizeof(struct sg_table), GFP_KERNEL); if (st == NULL) { ret = -ENOMEM; - goto err_unpin_pages; + goto err; } ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); @@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme sg_free_table(st); err_free: kfree(st); -err_unpin_pages: - i915_gem_object_unpin_pages(obj); err: return ERR_PTR(ret); } @@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *sg, enum dma_data_direction dir) { - struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); - dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sg); kfree(sg); - - i915_gem_object_unpin_pages(obj); } static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) @@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direct return err; } +static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attach) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + + return i915_gem_object_pin_pages_unlocked(obj); +} + +static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attach) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + + i915_gem_object_unpin_pages(obj); +} + static const struct dma_buf_ops i915_dmabuf_ops = { + .attach = i915_gem_dmabuf_attach, + .detach = i915_gem_dmabuf_detach, .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, .release = drm_gem_dmabuf_release, @@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj) struct sg_table *pages; unsigned int sg_page_sizes; + assert_object_held(obj); + pages = dma_buf_map_attachment(obj->base.import_attach,
[PATCH 7/7] drm/i915/gem: Migrate to system at dma-buf attach time (v7)
From: Thomas Hellström Until we support p2p dma or as a complement to that, migrate data to system memory at dma-buf attach time if possible. v2: - Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver selftest to migrate if we are LMEM capable. v3: - Migrate also in the pin() callback. v4: - Migrate in attach v5: (jason) - Lock around the migration v6: (jason) - Move the can_migrate check outside the lock - Rework the selftests to test more migration conditions. In particular, SMEM, LMEM, and LMEM+SMEM are all checked. v7: (mauld) - Misc style nits Signed-off-by: Thomas Hellström Signed-off-by: Michael J. Ruhl Reported-by: kernel test robot Signed-off-by: Jason Ekstrand Reviewed-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 - .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 87 ++- 2 files changed, 106 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 59dc56ae14d6b..afa34111de02e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + struct i915_gem_ww_ctx ww; + int err; + + if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) + return -EOPNOTSUPP; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + err = i915_gem_object_migrate(obj, &ww, INTEL_REGION_SMEM); + if (err) + continue; - return i915_gem_object_pin_pages_unlocked(obj); + err = i915_gem_object_wait_migration(obj, 0); + if (err) + continue; + + err = i915_gem_object_pin_pages(obj); + } + + return err; } static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c index d4ce01e6ee854..ffae7df5e4d7d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c @@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg) return err; } -static int igt_dmabuf_import_same_driver(void *arg) +static int igt_dmabuf_import_same_driver_lmem(void *arg) { struct drm_i915_private *i915 = arg; + struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM]; + struct drm_i915_gem_object *obj; + struct drm_gem_object *import; + struct dma_buf *dmabuf; + int err; + + if (!lmem) + return 0; + + force_different_devices = true; + + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &lmem, 1); + if (IS_ERR(obj)) { + pr_err("__i915_gem_object_create_user failed with err=%ld\n", + PTR_ERR(dmabuf)); + err = PTR_ERR(obj); + goto out_ret; + } + + dmabuf = i915_gem_prime_export(&obj->base, 0); + if (IS_ERR(dmabuf)) { + pr_err("i915_gem_prime_export failed with err=%ld\n", + PTR_ERR(dmabuf)); + err = PTR_ERR(dmabuf); + goto out; + } + + /* +* We expect an import of an LMEM-only object to fail with +* -EOPNOTSUPP because it can't be migrated to SMEM. +*/ + import = i915_gem_prime_import(&i915->drm, dmabuf); + if (!IS_ERR(import)) { + drm_gem_object_put(import); + pr_err("i915_gem_prime_import succeeded when it shouldn't have\n"); + err = -EINVAL; + } else if (PTR_ERR(import) != -EOPNOTSUPP) { + pr_err("i915_gem_prime_import failed with the wrong err=%ld\n", + PTR_ERR(import)); + err = PTR_ERR(import); + } + + dma_buf_put(dmabuf); +out: + i915_gem_object_put(obj); +out_ret: + force_different_devices = false; + return err; +} + +static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915, +struct intel_memory_region **regions, +unsigned int num_regions) +{ struct drm_i915_gem_object *obj, *import_obj; struct drm_gem_object *import; struct dma_buf *dmabuf; @@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg) int err; force_different_devices = true; - obj = i915_gem_object_create_shmem(i915, PAGE_SIZE); + + obj = __i915_gem_obje
Re: [Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global
On Wed, Jul 21, 2021 at 1:56 PM Daniel Vetter wrote: > > On Wed, Jul 21, 2021 at 05:25:41PM +0100, Matthew Auld wrote: > > On 21/07/2021 16:23, Jason Ekstrand wrote: > > > There's no reason that I can tell why this should be per-i915_buddy_mm > > > and doing so causes KMEM_CACHE to throw dmesg warnings because it tries > > > to create a debugfs entry with the name i915_buddy_block multiple times. > > > We could handle this by carefully giving each slab its own name but that > > > brings its own pain because then we have to store that string somewhere > > > and manage the lifetimes of the different slabs. The most likely > > > outcome would be a global atomic which we increment to get a new name or > > > something like that. > > > > > > The much easier solution is to use the i915_globals system like we do > > > for every other slab in i915. This ensures that we have exactly one of > > > them for each i915 driver load and it gets neatly created on module load > > > and destroyed on module unload. Using the globals system also means > > > that its now tied into the shrink handler so we can properly respond to > > > low-memory situations. > > > > > > Signed-off-by: Jason Ekstrand > > > Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man") > > > Cc: Matthew Auld > > > Cc: Christian König > > > > It was intentionally ripped it out with the idea that we would be moving the > > buddy stuff into ttm, and so part of that was trying to get rid of the some > > of the i915 specifics, like this globals thing. > > > > Reviewed-by: Matthew Auld > > I just sent out a patch to put i915_globals on a diet, so maybe we can > hold this patch here a bit when there's other reasons for why this is > special? This is required to get rid of the dmesg warnings. > Or at least no make this use the i915_globals stuff and instead just link > up the init/exit function calls directly into Jason's new table, so that > we don't have a merge conflict here? I'm happy to deal with merge conflicts however they land. --Jason > -Daniel > > > > > > --- > > > drivers/gpu/drm/i915/i915_buddy.c | 44 ++--- > > > drivers/gpu/drm/i915/i915_buddy.h | 3 +- > > > drivers/gpu/drm/i915/i915_globals.c | 2 ++ > > > 3 files changed, 38 insertions(+), 11 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_buddy.c > > > b/drivers/gpu/drm/i915/i915_buddy.c > > > index 29dd7d0310c1f..911feedad4513 100644 > > > --- a/drivers/gpu/drm/i915/i915_buddy.c > > > +++ b/drivers/gpu/drm/i915/i915_buddy.c > > > @@ -8,8 +8,14 @@ > > > #include "i915_buddy.h" > > > #include "i915_gem.h" > > > +#include "i915_globals.h" > > > #include "i915_utils.h" > > > +static struct i915_global_buddy { > > > + struct i915_global base; > > > + struct kmem_cache *slab_blocks; > > > +} global; > > > + > > > static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm > > > *mm, > > > struct i915_buddy_block > > > *parent, > > > unsigned int order, > > > @@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct > > > i915_buddy_mm *mm, > > > GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER); > > > - block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL); > > > + block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL); > > > if (!block) > > > return NULL; > > > @@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct > > > i915_buddy_mm *mm, > > > static void i915_block_free(struct i915_buddy_mm *mm, > > > struct i915_buddy_block *block) > > > { > > > - kmem_cache_free(mm->slab_blocks, block); > > > + kmem_cache_free(global.slab_blocks, block); > > > } > > > static void mark_allocated(struct i915_buddy_block *block) > > > @@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 > > > size, u64 chunk_size) > > > GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER); > > > - mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN); > > > - if (!mm->slab_blocks) > > > - return -ENOMEM; > > > - > > &
Re: [PATCH] drm/i915: Ditch i915 globals shrink infrastructure
On Wed, Jul 21, 2021 at 1:32 PM Daniel Vetter wrote: > > This essentially reverts > > commit 84a1074920523430f9dc30ff907f4801b4820072 > Author: Chris Wilson > Date: Wed Jan 24 11:36:08 2018 + > > drm/i915: Shrink the GEM kmem_caches upon idling > > mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it > then we need to fix that there, not hand-roll our own slab shrinking > code in i915. > > Noticed while reviewing a patch set from Jason to fix up some issues > in our i915_init() and i915_exit() module load/cleanup code. Now that > i915_globals.c isn't any different than normal init/exit functions, we > should convert them over to one unified table and remove > i915_globals.[hc] entirely. Mind throwing in a comment somewhere about how i915 is one of only two users of kmem_cache_shrink() in the entire kernel? That also seems to be pretty good evidence that it's not useful. Reviewed-by: Jason Ekstrand Feel free to land at-will and I'll deal with merge conflicts on my end. > Cc: David Airlie > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 -- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 -- > drivers/gpu/drm/i915/gt/intel_context.c | 6 -- > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 4 - > drivers/gpu/drm/i915/i915_active.c | 6 -- > drivers/gpu/drm/i915/i915_globals.c | 95 - > drivers/gpu/drm/i915/i915_globals.h | 3 - > drivers/gpu/drm/i915/i915_request.c | 7 -- > drivers/gpu/drm/i915/i915_scheduler.c | 7 -- > drivers/gpu/drm/i915/i915_vma.c | 6 -- > 10 files changed, 146 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 7d6f52d8a801..bf2a2319353a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -2280,18 +2280,12 @@ i915_gem_engines_iter_next(struct > i915_gem_engines_iter *it) > #include "selftests/i915_gem_context.c" > #endif > > -static void i915_global_gem_context_shrink(void) > -{ > - kmem_cache_shrink(global.slab_luts); > -} > - > static void i915_global_gem_context_exit(void) > { > kmem_cache_destroy(global.slab_luts); > } > > static struct i915_global_gem_context global = { { > - .shrink = i915_global_gem_context_shrink, > .exit = i915_global_gem_context_exit, > } }; > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c > b/drivers/gpu/drm/i915/gem/i915_gem_object.c > index 9da7b288b7ed..5c21cff33199 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -664,18 +664,12 @@ void i915_gem_init__objects(struct drm_i915_private > *i915) > INIT_WORK(&i915->mm.free_work, __i915_gem_free_work); > } > > -static void i915_global_objects_shrink(void) > -{ > - kmem_cache_shrink(global.slab_objects); > -} > - > static void i915_global_objects_exit(void) > { > kmem_cache_destroy(global.slab_objects); > } > > static struct i915_global_object global = { { > - .shrink = i915_global_objects_shrink, > .exit = i915_global_objects_exit, > } }; > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index bd63813c8a80..c1338441cc1d 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -398,18 +398,12 @@ void intel_context_fini(struct intel_context *ce) > i915_active_fini(&ce->active); > } > > -static void i915_global_context_shrink(void) > -{ > - kmem_cache_shrink(global.slab_ce); > -} > - > static void i915_global_context_exit(void) > { > kmem_cache_destroy(global.slab_ce); > } > > static struct i915_global_context global = { { > - .shrink = i915_global_context_shrink, > .exit = i915_global_context_exit, > } }; > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > index aef3084e8b16..d86825437516 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > @@ -67,8 +67,6 @@ static int __gt_unpark(struct intel_wakeref *wf) > > GT_TRACE(gt, "\n"); > > - i915_globals_unpark(); > - > /* > * It seems that the DMC likes to transition between the DC states a > lot > * when there are no connected displays (no active power domains) > during > @@ -116,8 +114
Re: [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld wrote: > > From: Chris Wilson > > Jason Ekstrand requested a more efficient method than userptr+set-domain > to determine if the userptr object was backed by a complete set of pages > upon creation. To be more efficient than simply populating the userptr > using get_user_pages() (as done by the call to set-domain or execbuf), > we can walk the tree of vm_area_struct and check for gaps or vma not > backed by struct page (VM_PFNMAP). The question is how to handle > VM_MIXEDMAP which may be either struct page or pfn backed... > > With discrete are going to drop support for set_domain(), so offering a > way to probe the pages, without having to resort to dummy batches has > been requested. > > v2: > - add new query param for the PROPBE flag, so userspace can easily > check if the kernel supports it(Jason). > - use mmap_read_{lock, unlock}. > - add some kernel-doc. > > Testcase: igt/gem_userptr_blits/probe > Signed-off-by: Chris Wilson > Signed-off-by: Matthew Auld > Cc: Thomas Hellström > Cc: Maarten Lankhorst > Cc: Tvrtko Ursulin > Cc: Jordan Justen > Cc: Kenneth Graunke > Cc: Jason Ekstrand > Cc: Daniel Vetter > Cc: Ramalingam C > --- > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 - > drivers/gpu/drm/i915/i915_getparam.c| 3 ++ > include/uapi/drm/i915_drm.h | 18 ++ > 3 files changed, 60 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > index 56edfeff8c02..fd6880328596 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops > i915_gem_userptr_ops = { > > #endif > > +static int > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > +{ > + const unsigned long end = addr + len; > + struct vm_area_struct *vma; > + int ret = -EFAULT; > + > + mmap_read_lock(mm); > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > + if (vma->vm_start > addr) Why isn't this > end? Are we somehow guaranteed that one vma covers the entire range? > + break; > + > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > + break; > + > + if (vma->vm_end >= end) { > + ret = 0; > + break; > + } > + > + addr = vma->vm_end; > + } > + mmap_read_unlock(mm); > + > + return ret; > +} > + > /* > * Creates a new mm object that wraps some normal memory from the process > * context - user memory. > @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > } > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > - I915_USERPTR_UNSYNCHRONIZED)) > + I915_USERPTR_UNSYNCHRONIZED | > + I915_USERPTR_PROBE)) > return -EINVAL; > > if (i915_gem_object_size_2big(args->user_size)) > @@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > return -ENODEV; > } > > + if (args->flags & I915_USERPTR_PROBE) { > + /* > +* Check that the range pointed to represents real struct > +* pages and not iomappings (at this moment in time!) > +*/ > + ret = probe_range(current->mm, args->user_ptr, > args->user_size); > + if (ret) > + return ret; > + } > + > #ifdef CONFIG_MMU_NOTIFIER > obj = i915_gem_object_alloc(); > if (obj == NULL) > diff --git a/drivers/gpu/drm/i915/i915_getparam.c > b/drivers/gpu/drm/i915/i915_getparam.c > index 24e18219eb50..d6d2e1a10d14 100644 > --- a/drivers/gpu/drm/i915/i915_getparam.c > +++ b/drivers/gpu/drm/i915/i915_getparam.c > @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void > *data, > case I915_PARAM_PERF_REVISION: > value = i915_perf_ioctl_version(); > break; > + case I915_PARAM_HAS_USERPTR_PROBE: > + value = true; > + break; > default: > DRM_DEBUG("Unknown parameter %d\n", param->param); > return -EINVAL; > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index e20eeeca7a1c..2e
Re: [Intel-gfx] [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation
On Thu, Jul 22, 2021 at 3:44 AM Matthew Auld wrote: > > On Wed, 21 Jul 2021 at 21:28, Jason Ekstrand wrote: > > > > On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld wrote: > > > > > > From: Chris Wilson > > > > > > Jason Ekstrand requested a more efficient method than userptr+set-domain > > > to determine if the userptr object was backed by a complete set of pages > > > upon creation. To be more efficient than simply populating the userptr > > > using get_user_pages() (as done by the call to set-domain or execbuf), > > > we can walk the tree of vm_area_struct and check for gaps or vma not > > > backed by struct page (VM_PFNMAP). The question is how to handle > > > VM_MIXEDMAP which may be either struct page or pfn backed... > > > > > > With discrete are going to drop support for set_domain(), so offering a > > > way to probe the pages, without having to resort to dummy batches has > > > been requested. > > > > > > v2: > > > - add new query param for the PROPBE flag, so userspace can easily > > > check if the kernel supports it(Jason). > > > - use mmap_read_{lock, unlock}. > > > - add some kernel-doc. > > > > > > Testcase: igt/gem_userptr_blits/probe > > > Signed-off-by: Chris Wilson > > > Signed-off-by: Matthew Auld > > > Cc: Thomas Hellström > > > Cc: Maarten Lankhorst > > > Cc: Tvrtko Ursulin > > > Cc: Jordan Justen > > > Cc: Kenneth Graunke > > > Cc: Jason Ekstrand > > > Cc: Daniel Vetter > > > Cc: Ramalingam C > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 - > > > drivers/gpu/drm/i915/i915_getparam.c| 3 ++ > > > include/uapi/drm/i915_drm.h | 18 ++ > > > 3 files changed, 60 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > index 56edfeff8c02..fd6880328596 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops > > > i915_gem_userptr_ops = { > > > > > > #endif > > > > > > +static int > > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > > > +{ > > > + const unsigned long end = addr + len; > > > + struct vm_area_struct *vma; > > > + int ret = -EFAULT; > > > + > > > + mmap_read_lock(mm); > > > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > > > + if (vma->vm_start > addr) > > > > Why isn't this > end? Are we somehow guaranteed that one vma covers > > the entire range? > > AFAIK we are just making sure we don't have a hole(note that we also > update addr below), for example the user might have done a partial > munmap. There could be multiple vma's if the kernel was unable to > merge them. If we reach the vm_end >= end, then we know we have a > "valid" range. Ok. That wasn't obvious to me but I see the addr update now. Makes sense. Might be worth a one-line comment for the next guy. Either way, Reviewed-by: Jason Ekstrand Thanks for wiring this up! --Jason > > > > > + break; > > > + > > > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > > > + break; > > > + > > > + if (vma->vm_end >= end) { > > > + ret = 0; > > > + break; > > > + } > > > + > > > + addr = vma->vm_end; > > > + } > > > + mmap_read_unlock(mm); > > > + > > > + return ret; > > > +} > > > + > > > /* > > > * Creates a new mm object that wraps some normal memory from the process > > > * context - user memory. > > > @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > > } > > > > > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > > > - I915_USERPTR_UNSYNCHRONIZED)) > > > + I915_USERPTR_UNSYNCHRONIZED | > > > + I915_USERPTR_PROBE)) > > > return -EINV
Re: [Intel-gfx] [PATCH] drm/i915: Ditch i915 globals shrink infrastructure
On Thu, Jul 22, 2021 at 5:34 AM Tvrtko Ursulin wrote: > On 22/07/2021 11:16, Daniel Vetter wrote: > > On Thu, Jul 22, 2021 at 11:02:55AM +0100, Tvrtko Ursulin wrote: > >> On 21/07/2021 19:32, Daniel Vetter wrote: > >>> This essentially reverts > >>> > >>> commit 84a1074920523430f9dc30ff907f4801b4820072 > >>> Author: Chris Wilson > >>> Date: Wed Jan 24 11:36:08 2018 + > >>> > >>> drm/i915: Shrink the GEM kmem_caches upon idling > >>> > >>> mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it > >>> then we need to fix that there, not hand-roll our own slab shrinking > >>> code in i915. > >> > >> This is somewhat incomplete statement which ignores a couple of angles so I > >> wish there was a bit more time to respond before steam rolling it in. :( > >> > >> The removed code was not a hand rolled shrinker, but about managing slab > >> sizes in face of bursty workloads. Core code does not know when i915 is > >> active and when it is idle, so calling kmem_cache_shrink() after going idle > >> wass supposed to help with house keeping by doing house keeping work > >> outside > >> of the latency sensitive phase. > >> > >> To "fix" (improve really) it in core as you suggest, would need some method > >> of signaling when a slab user feels is an opportunte moment to do this > >> house > >> keeping. And kmem_cache_shrink is just that so I don't see the problem. > >> > >> Granted, argument kmem_cache_shrink is not much used is a valid one so > >> discussion overall is definitely valid. Becuase on the higher level we > >> could > >> definitely talk about which workloads actually benefit from this code and > >> how much which probably no one knows at this point. Pardon me for being a bit curt here, but that discussion should have happened 3.5 years ago when this landed. The entire justification we have on record for this change is, "When we finally decide the gpu is idle, that is a good time to shrink our kmem_caches." We have no record of any workloads which benefit from this and no recorded way to reproduce any supposed benefits, even if it requires a microbenchmark. But we added over 100 lines of code for it anyway, including a bunch of hand-rolled RCU juggling. Ripping out unjustified complexity is almost always justified, IMO. The burden of proof here isn't on Daniel to show he isn't regressing anything but it was on you and Chris to show that complexity was worth something back in 2018 when this landed. --Jason > >> But in general I think you needed to leave more time for discussion. 12 > >> hours is way too short. > > > > It's 500+ users of kmem_cache_create vs i915 doing kmem_cache_shrink. And > > There are two other callers for the record. ;) > > > I guarantee you there's slab users that churn through more allocations > > than we do, and are more bursty. > > I wasn't disputing that. > > > An extraordinary claim like this needs extraordinary evidence. And then a > > discussion with core mm/ folks so that we can figure out how to solve the > > discovered problem best for the other 500+ users of slabs in-tree, so that > > everyone benefits. Not just i915 gpu workloads. > > Yep, not disputing that either. Noticed I wrote it was a valid argument? > > But discussion with mm folks could also have happened before you steam > rolled the "revert" in though. Perhaps tey would have said > kmem_cache_shrink is the way. Or maybe it isn't. Or maybe they would > have said meh. I just don't see how the rush was justified given the > code in question. > > Regards, > > Tvrtko > > > -Daniel > > > >>> Noticed while reviewing a patch set from Jason to fix up some issues > >>> in our i915_init() and i915_exit() module load/cleanup code. Now that > >>> i915_globals.c isn't any different than normal init/exit functions, we > >>> should convert them over to one unified table and remove > >>> i915_globals.[hc] entirely. > >>> > >>> Cc: David Airlie > >>> Cc: Jason Ekstrand > >>> Signed-off-by: Daniel Vetter > >>> --- > >>>drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 -- > >>>drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 -- > >>>drivers/gpu/drm/i915/gt/intel_context.c | 6 -- > >>>drivers/gpu/drm/i915/gt/intel_gt_pm.c | 4 - > >>
[PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)
This patch series fixes an issue with discrete graphics on Intel where we allowed dma-buf import while leaving the object in local memory. This breaks down pretty badly if the import happened on a different physical device. v7: - Drop "drm/i915/gem/ttm: Place new BOs in the requested region" - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()" - Misc. review feedback from Matthew Auld v8: - Misc. review feedback from Matthew Auld v9: - Replace the i915/ttm patch with two that are hopefully more correct Jason Ekstrand (6): drm/i915/gem: Check object_can_migrate from object_migrate drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2) drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create() drm/i915/gem: Unify user object creation (v3) drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails Thomas Hellström (2): drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8) drm/i915/gem: Migrate to system at dma-buf attach time (v7) drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 58 -- drivers/gpu/drm/i915/gem/i915_gem_object.c| 20 +- drivers/gpu/drm/i915/gem/i915_gem_object.h| 4 + drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 190 +- .../drm/i915/gem/selftests/i915_gem_migrate.c | 15 -- 7 files changed, 341 insertions(+), 136 deletions(-) -- 2.31.1
[PATCH 2/8] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)
Since we don't allow changing the set of regions after creation, we can make ext_set_placements() build up the region set directly in the create_ext and assign it to the object later. This is similar to what we did for contexts with the proto-context only simpler because there's no funny object shuffling. This will be used in the next patch to allow us to de-duplicate a bunch of code. Also, since we know the maximum number of regions up-front, we can use a fixed-size temporary array for the regions. This simplifies memory management a bit for this new delayed approach. v2 (Matthew Auld): - Get rid of MAX_N_PLACEMENTS - Drop kfree(placements) from set_placements() v3 (Matthew Auld): - Properly set ext_data->n_placements Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 82 -- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 51f92e4b1a69d..aa687b10dcd45 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -27,10 +27,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object *obj) return max_page_size; } -static void object_set_placements(struct drm_i915_gem_object *obj, - struct intel_memory_region **placements, - unsigned int n_placements) +static int object_set_placements(struct drm_i915_gem_object *obj, +struct intel_memory_region **placements, +unsigned int n_placements) { + struct intel_memory_region **arr; + unsigned int i; + GEM_BUG_ON(!n_placements); /* @@ -44,9 +47,20 @@ static void object_set_placements(struct drm_i915_gem_object *obj, obj->mm.placements = &i915->mm.regions[mr->id]; obj->mm.n_placements = 1; } else { - obj->mm.placements = placements; + arr = kmalloc_array(n_placements, + sizeof(struct intel_memory_region *), + GFP_KERNEL); + if (!arr) + return -ENOMEM; + + for (i = 0; i < n_placements; i++) + arr[i] = placements[i]; + + obj->mm.placements = arr; obj->mm.n_placements = n_placements; } + + return 0; } static int i915_gem_publish(struct drm_i915_gem_object *obj, @@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file, return -ENOMEM; mr = intel_memory_region_by_type(to_i915(dev), mem_type); - object_set_placements(obj, &mr, 1); + ret = object_set_placements(obj, &mr, 1); + if (ret) + goto object_free; ret = i915_gem_setup(obj, args->size); if (ret) @@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, return -ENOMEM; mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM); - object_set_placements(obj, &mr, 1); + ret = object_set_placements(obj, &mr, 1); + if (ret) + goto object_free; ret = i915_gem_setup(obj, args->size); if (ret) @@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, struct create_ext { struct drm_i915_private *i915; - struct drm_i915_gem_object *vanilla_object; + struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; + unsigned int n_placements; }; static void repr_placements(char *buf, size_t size, @@ -230,8 +249,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, struct drm_i915_private *i915 = ext_data->i915; struct drm_i915_gem_memory_class_instance __user *uregions = u64_to_user_ptr(args->regions); - struct drm_i915_gem_object *obj = ext_data->vanilla_object; - struct intel_memory_region **placements; + struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; u32 mask; int i, ret = 0; @@ -245,6 +263,8 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, ret = -EINVAL; } + BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements)); + BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != ARRAY_SIZE(placements)); if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) { drm_dbg(&i915->drm, "num_regions is too large\n"); ret = -EINVAL; @@ -253,21 +273,13 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, if (ret) return ret; - placements
[PATCH 1/8] drm/i915/gem: Check object_can_migrate from object_migrate
We don't roll them together entirely because there are still a couple cases where we want a separate can_migrate check. For instance, the display code checks that you can migrate a buffer to LMEM before it accepts it in fb_create. The dma-buf import code also uses it to do an early check and return a different error code if someone tries to attach a LMEM-only dma-buf to another driver. However, no one actually wants to call object_migrate when can_migrate has failed. The stated intention is for self-tests but none of those actually take advantage of this unsafe migration. Signed-off-by: Jason Ekstrand Cc: Daniel Vetter Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 ++--- .../gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 15 --- 2 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 5c21cff33199e..d09bd9bdb38ac 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -584,12 +584,6 @@ bool i915_gem_object_can_migrate(struct drm_i915_gem_object *obj, * completed yet, and to accomplish that, i915_gem_object_wait_migration() * must be called. * - * This function is a bit more permissive than i915_gem_object_can_migrate() - * to allow for migrating objects where the caller knows exactly what is - * happening. For example within selftests. More specifically this - * function allows migrating I915_BO_ALLOC_USER objects to regions - * that are not in the list of allowable regions. - * * Note: the @ww parameter is not used yet, but included to make sure * callers put some effort into obtaining a valid ww ctx if one is * available. @@ -616,11 +610,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, if (obj->mm.region == mr) return 0; - if (!i915_gem_object_evictable(obj)) - return -EBUSY; - - if (!obj->ops->migrate) - return -EOPNOTSUPP; + if (!i915_gem_object_can_migrate(obj, id)) + return -EINVAL; return obj->ops->migrate(obj, mr); } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c index 0b7144d2991ca..28a700f08b49a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c @@ -61,11 +61,6 @@ static int igt_create_migrate(struct intel_gt *gt, enum intel_region_id src, if (err) continue; - if (!i915_gem_object_can_migrate(obj, dst)) { - err = -EINVAL; - continue; - } - err = i915_gem_object_migrate(obj, &ww, dst); if (err) continue; @@ -114,11 +109,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww, return err; if (i915_gem_object_is_lmem(obj)) { - if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) { - pr_err("object can't migrate to smem.\n"); - return -EINVAL; - } - err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM); if (err) { pr_err("Object failed migration to smem\n"); @@ -137,11 +127,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww, } } else { - if (!i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) { - pr_err("object can't migrate to lmem.\n"); - return -EINVAL; - } - err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM); if (err) { pr_err("Object failed migration to lmem\n"); -- 2.31.1
[PATCH 3/8] drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()
This doesn't really fix anything serious since the chances of a client creating and destroying a mass of dumb BOs is pretty low. However, it is called by the other two create IOCTLs to garbage collect old objects. Call it here too for consistency. Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index aa687b10dcd45..adcce37c04b8d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -151,6 +151,8 @@ i915_gem_dumb_create(struct drm_file *file, if (args->pitch < args->width) return -EINVAL; + i915_gem_flush_free_objects(i915); + args->size = mul_u32_u32(args->pitch, args->height); mem_type = INTEL_MEMORY_SYSTEM; -- 2.31.1
[PATCH 4/8] drm/i915/gem: Unify user object creation (v3)
Instead of hand-rolling the same three calls in each function, pull them into an i915_gem_object_create_user helper. Apart from re-ordering of the placements array ENOMEM check, there should be no functional change. v2 (Matthew Auld): - Add the call to i915_gem_flush_free_objects() from i915_gem_dumb_create() in a separate patch - Move i915_gem_object_alloc() below the simple error checks v3 (Matthew Auld): - Add __ to i915_gem_object_create_user and kerneldoc which warns the caller that it's not validating anything. Signed-off-by: Jason Ekstrand Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 119 ++--- drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 + 2 files changed, 58 insertions(+), 65 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index adcce37c04b8d..23fee13a33844 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -11,13 +11,14 @@ #include "i915_trace.h" #include "i915_user_extensions.h" -static u32 object_max_page_size(struct drm_i915_gem_object *obj) +static u32 object_max_page_size(struct intel_memory_region **placements, + unsigned int n_placements) { u32 max_page_size = 0; int i; - for (i = 0; i < obj->mm.n_placements; i++) { - struct intel_memory_region *mr = obj->mm.placements[i]; + for (i = 0; i < n_placements; i++) { + struct intel_memory_region *mr = placements[i]; GEM_BUG_ON(!is_power_of_2(mr->min_page_size)); max_page_size = max_t(u32, max_page_size, mr->min_page_size); @@ -81,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj, return 0; } -static int -i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) +/** + * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT + * @i915: i915 private + * @size: size of the buffer, in bytes + * @placements: possible placement regions, in priority order + * @n_placements: number of possible placement regions + * + * This function is exposed primarily for selftests and does very little + * error checking. It is assumed that the set of placement regions has + * already been verified to be valid. + */ +struct drm_i915_gem_object * +__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size, + struct intel_memory_region **placements, + unsigned int n_placements) { - struct intel_memory_region *mr = obj->mm.placements[0]; + struct intel_memory_region *mr = placements[0]; + struct drm_i915_gem_object *obj; unsigned int flags; int ret; - size = round_up(size, object_max_page_size(obj)); + i915_gem_flush_free_objects(i915); + + size = round_up(size, object_max_page_size(placements, n_placements)); if (size == 0) - return -EINVAL; + return ERR_PTR(-EINVAL); /* For most of the ABI (e.g. mmap) we think in system pages */ GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE)); if (i915_gem_object_size_2big(size)) - return -E2BIG; + return ERR_PTR(-E2BIG); + + obj = i915_gem_object_alloc(); + if (!obj) + return ERR_PTR(-ENOMEM); + + ret = object_set_placements(obj, placements, n_placements); + if (ret) + goto object_free; /* * I915_BO_ALLOC_USER will make sure the object is cleared before @@ -106,12 +131,18 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size) ret = mr->ops->init_object(mr, obj, size, 0, flags); if (ret) - return ret; + goto object_free; GEM_BUG_ON(size != obj->base.size); trace_i915_gem_object_create(obj); - return 0; + return obj; + +object_free: + if (obj->mm.n_placements > 1) + kfree(obj->mm.placements); + i915_gem_object_free(obj); + return ERR_PTR(ret); } int @@ -124,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file, enum intel_memory_type mem_type; int cpp = DIV_ROUND_UP(args->bpp, 8); u32 format; - int ret; switch (cpp) { case 1: @@ -151,32 +181,19 @@ i915_gem_dumb_create(struct drm_file *file, if (args->pitch < args->width) return -EINVAL; - i915_gem_flush_free_objects(i915); - args->size = mul_u32_u32(args->pitch, args->height); mem_type = INTEL_MEMORY_SYSTEM; if (HAS_LMEM(to_i915(dev))) mem_type = INTEL_MEMORY_LOCAL; - obj = i915_gem_object_alloc(); - if (!obj) - return -ENOMEM; - mr = intel_memory_region_by_type(to_i915(dev), mem_type); -
[PATCH 5/8] drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
__i915_ttm_get_pages does two things. First, it calls ttm_bo_validate() to check the given placement and migrate the BO if needed. Then, it updates the GEM object to match, in case the object was migrated. If no migration occured, however, we might still have pages on the GEM object in which case we don't need to fetch them from TTM and call __i915_gem_object_set_pages. This hasn't been a problem before because the primary user of __i915_ttm_get_pages is __i915_gem_object_get_pages which only calls it if the GEM object doesn't have pages. However, i915_ttm_migrate also uses __i915_ttm_get_pages to do the migration so this meant it was unsafe to call on an already populated object. This patch checks i915_gem_object_has_pages() before trying to __i915_gem_object_set_pages so i915_ttm_migrate is safe to call, even on populated objects. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index f253b11e9e367..771eb2963123f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -662,13 +662,14 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj, i915_ttm_adjust_gem_after_move(obj); } - GEM_WARN_ON(obj->mm.pages); - /* Object either has a page vector or is an iomem object */ - st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st; - if (IS_ERR(st)) - return PTR_ERR(st); + if (!i915_gem_object_has_pages(obj)) { + /* Object either has a page vector or is an iomem object */ + st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st; + if (IS_ERR(st)) + return PTR_ERR(st); - __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl)); + __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl)); + } return ret; } -- 2.31.1
[PATCH 8/8] drm/i915/gem: Migrate to system at dma-buf attach time (v7)
From: Thomas Hellström Until we support p2p dma or as a complement to that, migrate data to system memory at dma-buf attach time if possible. v2: - Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver selftest to migrate if we are LMEM capable. v3: - Migrate also in the pin() callback. v4: - Migrate in attach v5: (jason) - Lock around the migration v6: (jason) - Move the can_migrate check outside the lock - Rework the selftests to test more migration conditions. In particular, SMEM, LMEM, and LMEM+SMEM are all checked. v7: (mauld) - Misc style nits Signed-off-by: Thomas Hellström Signed-off-by: Michael J. Ruhl Reported-by: kernel test robot Signed-off-by: Jason Ekstrand Reviewed-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 - .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 87 ++- 2 files changed, 106 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 59dc56ae14d6b..afa34111de02e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + struct i915_gem_ww_ctx ww; + int err; + + if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) + return -EOPNOTSUPP; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + err = i915_gem_object_migrate(obj, &ww, INTEL_REGION_SMEM); + if (err) + continue; - return i915_gem_object_pin_pages_unlocked(obj); + err = i915_gem_object_wait_migration(obj, 0); + if (err) + continue; + + err = i915_gem_object_pin_pages(obj); + } + + return err; } static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c index d4ce01e6ee854..ffae7df5e4d7d 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c @@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg) return err; } -static int igt_dmabuf_import_same_driver(void *arg) +static int igt_dmabuf_import_same_driver_lmem(void *arg) { struct drm_i915_private *i915 = arg; + struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM]; + struct drm_i915_gem_object *obj; + struct drm_gem_object *import; + struct dma_buf *dmabuf; + int err; + + if (!lmem) + return 0; + + force_different_devices = true; + + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &lmem, 1); + if (IS_ERR(obj)) { + pr_err("__i915_gem_object_create_user failed with err=%ld\n", + PTR_ERR(dmabuf)); + err = PTR_ERR(obj); + goto out_ret; + } + + dmabuf = i915_gem_prime_export(&obj->base, 0); + if (IS_ERR(dmabuf)) { + pr_err("i915_gem_prime_export failed with err=%ld\n", + PTR_ERR(dmabuf)); + err = PTR_ERR(dmabuf); + goto out; + } + + /* +* We expect an import of an LMEM-only object to fail with +* -EOPNOTSUPP because it can't be migrated to SMEM. +*/ + import = i915_gem_prime_import(&i915->drm, dmabuf); + if (!IS_ERR(import)) { + drm_gem_object_put(import); + pr_err("i915_gem_prime_import succeeded when it shouldn't have\n"); + err = -EINVAL; + } else if (PTR_ERR(import) != -EOPNOTSUPP) { + pr_err("i915_gem_prime_import failed with the wrong err=%ld\n", + PTR_ERR(import)); + err = PTR_ERR(import); + } + + dma_buf_put(dmabuf); +out: + i915_gem_object_put(obj); +out_ret: + force_different_devices = false; + return err; +} + +static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915, +struct intel_memory_region **regions, +unsigned int num_regions) +{ struct drm_i915_gem_object *obj, *import_obj; struct drm_gem_object *import; struct dma_buf *dmabuf; @@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg) int err; force_different_devices = true; - obj = i915_gem_object_create_shmem(i915, PAGE_SIZE); + + obj = __i915_gem_obje
[PATCH 7/8] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
From: Thomas Hellström If our exported dma-bufs are imported by another instance of our driver, that instance will typically have the imported dma-bufs locked during dma_buf_map_attachment(). But the exporter also locks the same reservation object in the map_dma_buf() callback, which leads to recursive locking. So taking the lock inside _pin_pages_unlocked() is incorrect. Additionally, the current pinning code path is contrary to the defined way that pinning should occur. Remove the explicit pin/unpin from the map/umap functions and move them to the attach/detach allowing correct locking to occur, and to match the static dma-buf drm_prime pattern. Add a live selftest to exercise both dynamic and non-dynamic exports. v2: - Extend the selftest with a fake dynamic importer. - Provide real pin and unpin callbacks to not abuse the interface. v3: (ruhl) - Remove the dynamic export support and move the pinning into the attach/detach path. v4: (ruhl) - Put pages does not need to assert on the dma-resv v5: (jason) - Lock around dma_buf_unmap_attachment() when emulating a dynamic importer in the subtests. - Use pin_pages_unlocked v6: (jason) - Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests v7: (mauld) - Use __i915_gem_object_get_pages (2 __underscores) instead of the 4 underscore version in the selftests v8: (mauld) - Drop the kernel doc from the static i915_gem_dmabuf_attach function - Add missing "err = PTR_ERR()" to a bunch of selftest error cases Reported-by: Michael J. Ruhl Signed-off-by: Thomas Hellström Signed-off-by: Michael J. Ruhl Signed-off-by: Jason Ekstrand Reviewed-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 37 -- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 109 +- 2 files changed, 132 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 616c3a2f1baf0..59dc56ae14d6b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -12,6 +12,8 @@ #include "i915_gem_object.h" #include "i915_scatterlist.h" +I915_SELFTEST_DECLARE(static bool force_different_devices;) + static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf) { return to_intel_bo(buf->priv); @@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme struct scatterlist *src, *dst; int ret, i; - ret = i915_gem_object_pin_pages_unlocked(obj); - if (ret) - goto err; - /* Copy sg so that we make an independent mapping */ st = kmalloc(sizeof(struct sg_table), GFP_KERNEL); if (st == NULL) { ret = -ENOMEM; - goto err_unpin_pages; + goto err; } ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); @@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme sg_free_table(st); err_free: kfree(st); -err_unpin_pages: - i915_gem_object_unpin_pages(obj); err: return ERR_PTR(ret); } @@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *sg, enum dma_data_direction dir) { - struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); - dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sg); kfree(sg); - - i915_gem_object_unpin_pages(obj); } static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) @@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direct return err; } +static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attach) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + + return i915_gem_object_pin_pages_unlocked(obj); +} + +static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attach) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf); + + i915_gem_object_unpin_pages(obj); +} + static const struct dma_buf_ops i915_dmabuf_ops = { + .attach = i915_gem_dmabuf_attach, + .detach = i915_gem_dmabuf_detach, .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, .release = drm_gem_dmabuf_release, @@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj) struct sg_table *pages; unsigned int sg_page_sizes; + assert_object_held(obj); + pages = dma_buf_map_attachment(obj->base.import_attach,
[PATCH 6/8] drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails
Without TTM, we have no such hook so we exit early but this is fine because we use TTM on all LMEM platforms and, on integrated platforms, there is no real migration. If we do have the hook, it's better to just let TTM handle the migration because it knows where things are actually placed. This fixes a bug where i915_gem_object_migrate fails to migrate newly created LMEM objects. In that scenario, the object has obj->mm.region set to LMEM but TTM has it in SMEM because that's where all new objects are placed there prior to getting actual pages. When we invoke i915_gem_object_migrate, it exits early because, from the point of view of the GEM object, it's already in LMEM and no migration is needed. Then, when we try to pin the pages, __i915_ttm_get_pages is called which, unaware of our failed attempt at a migration, places the object in SMEM. This only happens on newly created objects because they have this weird state where TTM thinks they're in SMEM, GEM thinks they're in LMEM, and the reality is that they don't exist at all. It's better if GEM just always calls into TTM and let's TTM handle things. That way the lies stay better contained. Once the migration is complete, the object will have pages, obj->mm.region will be correct, and we're done lying. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index d09bd9bdb38ac..9d3497e1235a0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -607,12 +607,15 @@ int i915_gem_object_migrate(struct drm_i915_gem_object *obj, mr = i915->mm.regions[id]; GEM_BUG_ON(!mr); - if (obj->mm.region == mr) - return 0; - if (!i915_gem_object_can_migrate(obj, id)) return -EINVAL; + if (!obj->ops->migrate) { + if (GEM_WARN_ON(obj->mm.region != mr)) + return -EINVAL; + return 0; + } + return obj->ops->migrate(obj, mr); } -- 2.31.1
Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044 On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld wrote: > > From: Chris Wilson > > Jason Ekstrand requested a more efficient method than userptr+set-domain > to determine if the userptr object was backed by a complete set of pages > upon creation. To be more efficient than simply populating the userptr > using get_user_pages() (as done by the call to set-domain or execbuf), > we can walk the tree of vm_area_struct and check for gaps or vma not > backed by struct page (VM_PFNMAP). The question is how to handle > VM_MIXEDMAP which may be either struct page or pfn backed... > > With discrete we are going to drop support for set_domain(), so offering > a way to probe the pages, without having to resort to dummy batches has > been requested. > > v2: > - add new query param for the PROBE flag, so userspace can easily > check if the kernel supports it(Jason). > - use mmap_read_{lock, unlock}. > - add some kernel-doc. > v3: > - In the docs also mention that PROBE doesn't guarantee that the pages > will remain valid by the time they are actually used(Tvrtko). > - Add a small comment for the hole finding logic(Jason). > - Move the param next to all the other params which just return true. > > Testcase: igt/gem_userptr_blits/probe > Signed-off-by: Chris Wilson > Signed-off-by: Matthew Auld > Cc: Thomas Hellström > Cc: Maarten Lankhorst > Cc: Tvrtko Ursulin > Cc: Jordan Justen > Cc: Kenneth Graunke > Cc: Jason Ekstrand > Cc: Daniel Vetter > Cc: Ramalingam C > Reviewed-by: Tvrtko Ursulin > Acked-by: Kenneth Graunke > Reviewed-by: Jason Ekstrand > --- > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 - > drivers/gpu/drm/i915/i915_getparam.c| 1 + > include/uapi/drm/i915_drm.h | 20 ++ > 3 files changed, 61 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > index 56edfeff8c02..468a7a617fbf 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops > i915_gem_userptr_ops = { > > #endif > > +static int > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > +{ > + const unsigned long end = addr + len; > + struct vm_area_struct *vma; > + int ret = -EFAULT; > + > + mmap_read_lock(mm); > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > + /* Check for holes, note that we also update the addr below */ > + if (vma->vm_start > addr) > + break; > + > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > + break; > + > + if (vma->vm_end >= end) { > + ret = 0; > + break; > + } > + > + addr = vma->vm_end; > + } > + mmap_read_unlock(mm); > + > + return ret; > +} > + > /* > * Creates a new mm object that wraps some normal memory from the process > * context - user memory. > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > } > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > - I915_USERPTR_UNSYNCHRONIZED)) > + I915_USERPTR_UNSYNCHRONIZED | > + I915_USERPTR_PROBE)) > return -EINVAL; > > if (i915_gem_object_size_2big(args->user_size)) > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > return -ENODEV; > } > > + if (args->flags & I915_USERPTR_PROBE) { > + /* > +* Check that the range pointed to represents real struct > +* pages and not iomappings (at this moment in time!) > +*/ > + ret = probe_range(current->mm, args->user_ptr, > args->user_size); > + if (ret) > + return ret; > + } > + > #ifdef CONFIG_MMU_NOTIFIER > obj = i915_gem_object_alloc(); > if (obj == NULL) > diff --git a/drivers/gpu/drm/i915/i915_getparam.c > b/drivers/gpu/drm/i915/i915_getparam.c > index 24e18219eb50..bbb7cac43eb4 100644 > --- a/drivers/gpu/drm/i915/i915_getparam.c > +++ b/drivers/gpu/drm/i915/i915_getparam.c > @@ -134,6 +134,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void > *data, > case I915_PARAM_HAS_EXE
Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation
Are there IGTs for this anywhere? On Fri, Jul 23, 2021 at 12:47 PM Jason Ekstrand wrote: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044 > > On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld wrote: > > > > From: Chris Wilson > > > > Jason Ekstrand requested a more efficient method than userptr+set-domain > > to determine if the userptr object was backed by a complete set of pages > > upon creation. To be more efficient than simply populating the userptr > > using get_user_pages() (as done by the call to set-domain or execbuf), > > we can walk the tree of vm_area_struct and check for gaps or vma not > > backed by struct page (VM_PFNMAP). The question is how to handle > > VM_MIXEDMAP which may be either struct page or pfn backed... > > > > With discrete we are going to drop support for set_domain(), so offering > > a way to probe the pages, without having to resort to dummy batches has > > been requested. > > > > v2: > > - add new query param for the PROBE flag, so userspace can easily > > check if the kernel supports it(Jason). > > - use mmap_read_{lock, unlock}. > > - add some kernel-doc. > > v3: > > - In the docs also mention that PROBE doesn't guarantee that the pages > > will remain valid by the time they are actually used(Tvrtko). > > - Add a small comment for the hole finding logic(Jason). > > - Move the param next to all the other params which just return true. > > > > Testcase: igt/gem_userptr_blits/probe > > Signed-off-by: Chris Wilson > > Signed-off-by: Matthew Auld > > Cc: Thomas Hellström > > Cc: Maarten Lankhorst > > Cc: Tvrtko Ursulin > > Cc: Jordan Justen > > Cc: Kenneth Graunke > > Cc: Jason Ekstrand > > Cc: Daniel Vetter > > Cc: Ramalingam C > > Reviewed-by: Tvrtko Ursulin > > Acked-by: Kenneth Graunke > > Reviewed-by: Jason Ekstrand > > --- > > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 - > > drivers/gpu/drm/i915/i915_getparam.c| 1 + > > include/uapi/drm/i915_drm.h | 20 ++ > > 3 files changed, 61 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > index 56edfeff8c02..468a7a617fbf 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops > > i915_gem_userptr_ops = { > > > > #endif > > > > +static int > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > > +{ > > + const unsigned long end = addr + len; > > + struct vm_area_struct *vma; > > + int ret = -EFAULT; > > + > > + mmap_read_lock(mm); > > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > > + /* Check for holes, note that we also update the addr below > > */ > > + if (vma->vm_start > addr) > > + break; > > + > > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > > + break; > > + > > + if (vma->vm_end >= end) { > > + ret = 0; > > + break; > > + } > > + > > + addr = vma->vm_end; > > + } > > + mmap_read_unlock(mm); > > + > > + return ret; > > +} > > + > > /* > > * Creates a new mm object that wraps some normal memory from the process > > * context - user memory. > > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > } > > > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > > - I915_USERPTR_UNSYNCHRONIZED)) > > + I915_USERPTR_UNSYNCHRONIZED | > > + I915_USERPTR_PROBE)) > > return -EINVAL; > > > > if (i915_gem_object_size_2big(args->user_size)) > > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > return -ENODEV; > > } > > > > + if (args->flags & I915_USERPTR_PROBE) { > > + /* > > +* Check that the range pointed to represents real struct > > +* pages and not iomappings (at this moment in time!) > > +*/ > &
Re: [PATCH 00/30] Remove CNL support
Generally a big fan. 👍 --Jason On July 23, 2021 19:11:34 Lucas De Marchi wrote: Patches 1 and 2 are already being reviewed elsewhere. Discussion on 2nd patch made me revive something I started after comment from Ville at https://patchwork.freedesktop.org/patch/428168/?series=88988&rev=1#comment_768918 This removes CNL completely from the driver, while trying to rename functions and macros where appropriate (usually to GLK when dealing with display or with ICL otherwise). It starts with display, which is more straightforward, and then proceed to the rest of i915. diff stat removing 1600 lines of dead code seems to pay the pain of doing this. Lucas De Marchi (30): drm/i915: fix not reading DSC disable fuse in GLK drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs() drm/i915/display: remove PORT_F workaround for CNL drm/i915/display: remove explicit CNL handling from intel_cdclk.c drm/i915/display: remove explicit CNL handling from intel_color.c drm/i915/display: remove explicit CNL handling from intel_combo_phy.c drm/i915/display: remove explicit CNL handling from intel_crtc.c drm/i915/display: remove explicit CNL handling from intel_ddi.c drm/i915/display: remove explicit CNL handling from intel_display_debugfs.c drm/i915/display: remove explicit CNL handling from intel_dmc.c drm/i915/display: remove explicit CNL handling from intel_dp.c drm/i915/display: remove explicit CNL handling from intel_dpll_mgr.c drm/i915/display: remove explicit CNL handling from intel_vdsc.c drm/i915/display: remove explicit CNL handling from skl_universal_plane.c drm/i915/display: remove explicit CNL handling from intel_display_power.c drm/i915/display: remove CNL ddi buf translation tables drm/i915/display: rename CNL references in skl_scaler.c drm/i915: remove explicit CNL handling from i915_irq.c drm/i915: remove explicit CNL handling from intel_pm.c drm/i915: remove explicit CNL handling from intel_mocs.c drm/i915: remove explicit CNL handling from intel_pch.c drm/i915: remove explicit CNL handling from intel_wopcm.c drm/i915/gt: remove explicit CNL handling from intel_sseu.c drm/i915: rename CNL references in intel_dram.c drm/i915/gt: rename CNL references in intel_engine.h drm/i915: finish removal of CNL drm/i915: remove GRAPHICS_VER == 10 drm/i915: rename/remove CNL registers drm/i915: replace random CNL comments drm/i915: switch num_scalers/num_sprites to consider DISPLAY_VER drivers/gpu/drm/i915/display/intel_bios.c | 8 +- drivers/gpu/drm/i915/display/intel_cdclk.c| 72 +- drivers/gpu/drm/i915/display/intel_color.c| 5 +- .../gpu/drm/i915/display/intel_combo_phy.c| 106 +-- drivers/gpu/drm/i915/display/intel_crtc.c | 2 +- drivers/gpu/drm/i915/display/intel_ddi.c | 266 +--- .../drm/i915/display/intel_ddi_buf_trans.c| 616 +- .../drm/i915/display/intel_ddi_buf_trans.h| 4 +- drivers/gpu/drm/i915/display/intel_display.c | 3 +- .../drm/i915/display/intel_display_debugfs.c | 2 +- .../drm/i915/display/intel_display_power.c| 289 .../drm/i915/display/intel_display_power.h| 2 - drivers/gpu/drm/i915/display/intel_dmc.c | 9 - drivers/gpu/drm/i915/display/intel_dp.c | 35 +- drivers/gpu/drm/i915/display/intel_dp_aux.c | 1 - drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 586 +++-- drivers/gpu/drm/i915/display/intel_dpll_mgr.h | 1 - drivers/gpu/drm/i915/display/intel_vbt_defs.h | 2 +- drivers/gpu/drm/i915/display/intel_vdsc.c | 5 +- drivers/gpu/drm/i915/display/skl_scaler.c | 10 +- .../drm/i915/display/skl_universal_plane.c| 14 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 1 - drivers/gpu/drm/i915/gt/debugfs_gt_pm.c | 10 +- drivers/gpu/drm/i915/gt/intel_engine.h| 2 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 - drivers/gpu/drm/i915/gt/intel_ggtt.c | 4 +- .../gpu/drm/i915/gt/intel_gt_clock_utils.c| 10 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 6 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 42 +- drivers/gpu/drm/i915/gt/intel_mocs.c | 2 +- drivers/gpu/drm/i915/gt/intel_rc6.c | 2 +- drivers/gpu/drm/i915/gt/intel_rps.c | 4 +- drivers/gpu/drm/i915/gt/intel_sseu.c | 79 --- drivers/gpu/drm/i915/gt/intel_sseu.h | 2 +- drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c | 6 +- drivers/gpu/drm/i915/gvt/gtt.c| 2 +- drivers/gpu/drm/i915/i915_debugfs.c | 6 +- drivers/gpu/drm/i915/i915_drv.h | 13 +- drivers/gpu/drm/i915/i915_irq.c | 7 +- drivers/gpu/drm/i915/i915_pci.c | 23 +- drivers/gpu/drm/i915/i915_perf.c | 22 +- drivers/gpu/drm/i915/i915_reg.h | 245 ++- drivers/gpu/drm/i915/intel_device_info.c | 23 +- drivers/gpu/drm/i915/intel_device_info.h | 4 +- drivers/gpu/drm/i915/intel_dram.c | 32 +- drivers
Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)
On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld wrote: > > On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand wrote: > > > > This patch series fixes an issue with discrete graphics on Intel where we > > allowed dma-buf import while leaving the object in local memory. This > > breaks down pretty badly if the import happened on a different physical > > device. > > > > v7: > > - Drop "drm/i915/gem/ttm: Place new BOs in the requested region" > > - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in > > i915_gem_dumb_create()" > > - Misc. review feedback from Matthew Auld > > v8: > > - Misc. review feedback from Matthew Auld > > v9: > > - Replace the i915/ttm patch with two that are hopefully more correct > > > > Jason Ekstrand (6): > > drm/i915/gem: Check object_can_migrate from object_migrate > > drm/i915/gem: Refactor placement setup for i915_gem_object_create* > > (v2) > > drm/i915/gem: Call i915_gem_flush_free_objects() in > > i915_gem_dumb_create() > > drm/i915/gem: Unify user object creation (v3) > > drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed > > drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails > > > > Thomas Hellström (2): > > drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8) > > drm/i915/gem: Migrate to system at dma-buf attach time (v7) > > Should I push the series? Yes, please. Do we have a solid testing plan for things like this that touch discrete? I tested with mesa+glxgears on my DG1 but haven't run anything more stressful. --Jason > > > > drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 58 -- > > drivers/gpu/drm/i915/gem/i915_gem_object.c| 20 +- > > drivers/gpu/drm/i915/gem/i915_gem_object.h| 4 + > > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +- > > .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 190 +- > > .../drm/i915/gem/selftests/i915_gem_migrate.c | 15 -- > > 7 files changed, 341 insertions(+), 136 deletions(-) > > > > -- > > 2.31.1 > > > > ___ > > Intel-gfx mailing list > > intel-...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation
On Mon, Jul 26, 2021 at 3:06 AM Matthew Auld wrote: > > On Fri, 23 Jul 2021 at 18:48, Jason Ekstrand wrote: > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044 > > Cool, is that ready to go? i.e can we start merging the kernel + IGT side. Yes, it's all reviewed. Though, it sounds like Maarten had a comment so we should settle on that before landing. > > > > On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld wrote: > > > > > > From: Chris Wilson > > > > > > Jason Ekstrand requested a more efficient method than userptr+set-domain > > > to determine if the userptr object was backed by a complete set of pages > > > upon creation. To be more efficient than simply populating the userptr > > > using get_user_pages() (as done by the call to set-domain or execbuf), > > > we can walk the tree of vm_area_struct and check for gaps or vma not > > > backed by struct page (VM_PFNMAP). The question is how to handle > > > VM_MIXEDMAP which may be either struct page or pfn backed... > > > > > > With discrete we are going to drop support for set_domain(), so offering > > > a way to probe the pages, without having to resort to dummy batches has > > > been requested. > > > > > > v2: > > > - add new query param for the PROBE flag, so userspace can easily > > > check if the kernel supports it(Jason). > > > - use mmap_read_{lock, unlock}. > > > - add some kernel-doc. > > > v3: > > > - In the docs also mention that PROBE doesn't guarantee that the pages > > > will remain valid by the time they are actually used(Tvrtko). > > > - Add a small comment for the hole finding logic(Jason). > > > - Move the param next to all the other params which just return true. > > > > > > Testcase: igt/gem_userptr_blits/probe > > > Signed-off-by: Chris Wilson > > > Signed-off-by: Matthew Auld > > > Cc: Thomas Hellström > > > Cc: Maarten Lankhorst > > > Cc: Tvrtko Ursulin > > > Cc: Jordan Justen > > > Cc: Kenneth Graunke > > > Cc: Jason Ekstrand > > > Cc: Daniel Vetter > > > Cc: Ramalingam C > > > Reviewed-by: Tvrtko Ursulin > > > Acked-by: Kenneth Graunke > > > Reviewed-by: Jason Ekstrand > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 - > > > drivers/gpu/drm/i915/i915_getparam.c| 1 + > > > include/uapi/drm/i915_drm.h | 20 ++ > > > 3 files changed, 61 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > index 56edfeff8c02..468a7a617fbf 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops > > > i915_gem_userptr_ops = { > > > > > > #endif > > > > > > +static int > > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > > > +{ > > > + const unsigned long end = addr + len; > > > + struct vm_area_struct *vma; > > > + int ret = -EFAULT; > > > + > > > + mmap_read_lock(mm); > > > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > > > + /* Check for holes, note that we also update the addr > > > below */ > > > + if (vma->vm_start > addr) > > > + break; > > > + > > > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > > > + break; > > > + > > > + if (vma->vm_end >= end) { > > > + ret = 0; > > > + break; > > > + } > > > + > > > + addr = vma->vm_end; > > > + } > > > + mmap_read_unlock(mm); > > > + > > > + return ret; > > > +} > > > + > > > /* > > > * Creates a new mm object that wraps some normal memory from the process > > > * context - user memory. > > > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > > } > > > > > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > > > - I915_USER
Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation
On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst wrote: > > Op 23-07-2021 om 13:34 schreef Matthew Auld: > > From: Chris Wilson > > > > Jason Ekstrand requested a more efficient method than userptr+set-domain > > to determine if the userptr object was backed by a complete set of pages > > upon creation. To be more efficient than simply populating the userptr > > using get_user_pages() (as done by the call to set-domain or execbuf), > > we can walk the tree of vm_area_struct and check for gaps or vma not > > backed by struct page (VM_PFNMAP). The question is how to handle > > VM_MIXEDMAP which may be either struct page or pfn backed... > > > > With discrete we are going to drop support for set_domain(), so offering > > a way to probe the pages, without having to resort to dummy batches has > > been requested. > > > > v2: > > - add new query param for the PROBE flag, so userspace can easily > > check if the kernel supports it(Jason). > > - use mmap_read_{lock, unlock}. > > - add some kernel-doc. > > v3: > > - In the docs also mention that PROBE doesn't guarantee that the pages > > will remain valid by the time they are actually used(Tvrtko). > > - Add a small comment for the hole finding logic(Jason). > > - Move the param next to all the other params which just return true. > > > > Testcase: igt/gem_userptr_blits/probe > > Signed-off-by: Chris Wilson > > Signed-off-by: Matthew Auld > > Cc: Thomas Hellström > > Cc: Maarten Lankhorst > > Cc: Tvrtko Ursulin > > Cc: Jordan Justen > > Cc: Kenneth Graunke > > Cc: Jason Ekstrand > > Cc: Daniel Vetter > > Cc: Ramalingam C > > Reviewed-by: Tvrtko Ursulin > > Acked-by: Kenneth Graunke > > Reviewed-by: Jason Ekstrand > > --- > > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 - > > drivers/gpu/drm/i915/i915_getparam.c| 1 + > > include/uapi/drm/i915_drm.h | 20 ++ > > 3 files changed, 61 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > index 56edfeff8c02..468a7a617fbf 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops > > i915_gem_userptr_ops = { > > > > #endif > > > > +static int > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len) > > +{ > > + const unsigned long end = addr + len; > > + struct vm_area_struct *vma; > > + int ret = -EFAULT; > > + > > + mmap_read_lock(mm); > > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > > + /* Check for holes, note that we also update the addr below */ > > + if (vma->vm_start > addr) > > + break; > > + > > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > > + break; > > + > > + if (vma->vm_end >= end) { > > + ret = 0; > > + break; > > + } > > + > > + addr = vma->vm_end; > > + } > > + mmap_read_unlock(mm); > > + > > + return ret; > > +} > > + > > /* > > * Creates a new mm object that wraps some normal memory from the process > > * context - user memory. > > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > } > > > > if (args->flags & ~(I915_USERPTR_READ_ONLY | > > - I915_USERPTR_UNSYNCHRONIZED)) > > + I915_USERPTR_UNSYNCHRONIZED | > > + I915_USERPTR_PROBE)) > > return -EINVAL; > > > > if (i915_gem_object_size_2big(args->user_size)) > > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev, > > return -ENODEV; > > } > > > > + if (args->flags & I915_USERPTR_PROBE) { > > + /* > > + * Check that the range pointed to represents real struct > > + * pages and not iomappings (at this moment in time!) > > + */ > > + ret = probe_range(current->mm, args->user_ptr, > > args->user_size); > > + if (ret) > > + return ret; > > + } > > +
Re: [PATCH 01/10] drm/i915: Check for nomodeset in i915_init() first
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > When modesetting (aka the full pci driver, which has nothing to do > with disable_display option, which just gives you the full pci driver > without the display driver) is disabled, we load nothing and do > nothing. > > So move that check first, for a bit of orderliness. With Jason's > module init/exit table this now becomes trivial. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter Reviewed-by: Jason Ekstrand > --- > drivers/gpu/drm/i915/i915_pci.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 48ea23dd3b5b..0deaeeba2347 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -1292,9 +1292,9 @@ static const struct { > int (*init)(void); > void (*exit)(void); > } init_funcs[] = { > + { i915_check_nomodeset, NULL }, > { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > - { i915_check_nomodeset, NULL }, > { i915_pmu_init, i915_pmu_exit }, > { i915_register_pci_driver, i915_unregister_pci_driver }, > { i915_perf_sysctl_register, i915_perf_sysctl_unregister }, > -- > 2.32.0 >
Re: [PATCH 02/10] drm/i915: move i915_active slab to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_cache to just a slab_cache. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_active.c | 31 ++--- > drivers/gpu/drm/i915/i915_active.h | 3 +++ > drivers/gpu/drm/i915/i915_globals.c | 2 -- > drivers/gpu/drm/i915/i915_globals.h | 1 - > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > 5 files changed, 16 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_active.c > b/drivers/gpu/drm/i915/i915_active.c > index 91723123ae9f..9ffeb77eb5bb 100644 > --- a/drivers/gpu/drm/i915/i915_active.c > +++ b/drivers/gpu/drm/i915/i915_active.c > @@ -13,7 +13,6 @@ > > #include "i915_drv.h" > #include "i915_active.h" > -#include "i915_globals.h" > > /* > * Active refs memory management > @@ -22,10 +21,7 @@ > * they idle (when we know the active requests are inactive) and allocate the > * nodes from a local slab cache to hopefully reduce the fragmentation. > */ > -static struct i915_global_active { > - struct i915_global base; > - struct kmem_cache *slab_cache; > -} global; > +struct kmem_cache *slab_cache; static? Or were you planning to expose it somehow? With that fixed, Reviewed-by: Jason Ekstrand > > struct active_node { > struct rb_node node; > @@ -174,7 +170,7 @@ __active_retire(struct i915_active *ref) > /* Finally free the discarded timeline tree */ > rbtree_postorder_for_each_entry_safe(it, n, &root, node) { > GEM_BUG_ON(i915_active_fence_isset(&it->base)); > - kmem_cache_free(global.slab_cache, it); > + kmem_cache_free(slab_cache, it); > } > } > > @@ -322,7 +318,7 @@ active_instance(struct i915_active *ref, u64 idx) > * XXX: We should preallocate this before i915_active_ref() is ever > * called, but we cannot call into fs_reclaim() anyway, so use > GFP_ATOMIC. > */ > - node = kmem_cache_alloc(global.slab_cache, GFP_ATOMIC); > + node = kmem_cache_alloc(slab_cache, GFP_ATOMIC); > if (!node) > goto out; > > @@ -788,7 +784,7 @@ void i915_active_fini(struct i915_active *ref) > mutex_destroy(&ref->mutex); > > if (ref->cache) > - kmem_cache_free(global.slab_cache, ref->cache); > + kmem_cache_free(slab_cache, ref->cache); > } > > static inline bool is_idle_barrier(struct active_node *node, u64 idx) > @@ -908,7 +904,7 @@ int i915_active_acquire_preallocate_barrier(struct > i915_active *ref, > node = reuse_idle_barrier(ref, idx); > rcu_read_unlock(); > if (!node) { > - node = kmem_cache_alloc(global.slab_cache, > GFP_KERNEL); > + node = kmem_cache_alloc(slab_cache, GFP_KERNEL); > if (!node) > goto unwind; > > @@ -956,7 +952,7 @@ int i915_active_acquire_preallocate_barrier(struct > i915_active *ref, > atomic_dec(&ref->count); > intel_engine_pm_put(barrier_to_engine(node)); > > - kmem_cache_free(global.slab_cache, node); > + kmem_cache_free(slab_cache, node); > } > return -ENOMEM; > } > @@ -1176,21 +1172,16 @@ struct i915_active *i915_active_create(void) > #include "selftests/i915_active.c" > #endif > > -static void i915_global_active_exit(void) > +void i915_active_module_exit(void) > { > - kmem_cache_destroy(global.slab_cache); > + kmem_cache_destroy(slab_cache); > } > > -static struct i915_global_active global = { { > - .exit = i915_global_active_exit, > -} }; > - > -int __init i915_global_active_init(void) > +int __init i915_active_module_init(void) > { > - global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN); > - if (!global.slab_cache) > + slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN); > + if (!slab_cache) > return -ENOMEM; > > - i915_global_register(&global.base); > return 0; > } > diff --git a/drivers/gpu/drm/i915/i915_active.h > b/drivers/gpu/drm/i915/i915_active.h > index d0feda68b874..5fcdb0e2bc9e 100644 > --- a/drivers/gpu/drm/i915/i915_active.h &g
Re: [PATCH 03/10] drm/i915: move i915_buddy slab to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_blocks to just a > slab_blocks. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_buddy.c | 25 - > drivers/gpu/drm/i915/i915_buddy.h | 3 ++- > drivers/gpu/drm/i915/i915_globals.c | 2 -- > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > 4 files changed, 12 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_buddy.c > b/drivers/gpu/drm/i915/i915_buddy.c > index caabcaea3be7..045d00c43b4c 100644 > --- a/drivers/gpu/drm/i915/i915_buddy.c > +++ b/drivers/gpu/drm/i915/i915_buddy.c > @@ -8,13 +8,9 @@ > #include "i915_buddy.h" > > #include "i915_gem.h" > -#include "i915_globals.h" > #include "i915_utils.h" > > -static struct i915_global_buddy { > - struct i915_global base; > - struct kmem_cache *slab_blocks; > -} global; > +struct kmem_cache *slab_blocks; static? With that fixed, Reviewed-by: Jason Ekstrand > > static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm, > struct i915_buddy_block > *parent, > @@ -25,7 +21,7 @@ static struct i915_buddy_block *i915_block_alloc(struct > i915_buddy_mm *mm, > > GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER); > > - block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL); > + block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL); > if (!block) > return NULL; > > @@ -40,7 +36,7 @@ static struct i915_buddy_block *i915_block_alloc(struct > i915_buddy_mm *mm, > static void i915_block_free(struct i915_buddy_mm *mm, > struct i915_buddy_block *block) > { > - kmem_cache_free(global.slab_blocks, block); > + kmem_cache_free(slab_blocks, block); > } > > static void mark_allocated(struct i915_buddy_block *block) > @@ -410,21 +406,16 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm, > #include "selftests/i915_buddy.c" > #endif > > -static void i915_global_buddy_exit(void) > +void i915_buddy_module_exit(void) > { > - kmem_cache_destroy(global.slab_blocks); > + kmem_cache_destroy(slab_blocks); > } > > -static struct i915_global_buddy global = { { > - .exit = i915_global_buddy_exit, > -} }; > - > -int __init i915_global_buddy_init(void) > +int __init i915_buddy_module_init(void) > { > - global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0); > - if (!global.slab_blocks) > + slab_blocks = KMEM_CACHE(i915_buddy_block, 0); > + if (!slab_blocks) > return -ENOMEM; > > - i915_global_register(&global.base); > return 0; > } > diff --git a/drivers/gpu/drm/i915/i915_buddy.h > b/drivers/gpu/drm/i915/i915_buddy.h > index d8f26706de52..3940d632f208 100644 > --- a/drivers/gpu/drm/i915/i915_buddy.h > +++ b/drivers/gpu/drm/i915/i915_buddy.h > @@ -129,6 +129,7 @@ void i915_buddy_free(struct i915_buddy_mm *mm, struct > i915_buddy_block *block); > > void i915_buddy_free_list(struct i915_buddy_mm *mm, struct list_head > *objects); > > -int i915_global_buddy_init(void); > +void i915_buddy_module_exit(void); > +int i915_buddy_module_init(void); > > #endif > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index a53135ee831d..3de7cf22ec76 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -7,7 +7,6 @@ > #include > #include > > -#include "i915_buddy.h" > #include "gem/i915_gem_context.h" > #include "gem/i915_gem_object.h" > #include "i915_globals.h" > @@ -33,7 +32,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_buddy_init, > i915_global_context_init, > i915_global_gem_context_init, > i915_global_objects_init, > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 6ee77a8f43d6..f9527269e30a 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -31,6 +31,7 @@ > #include "display/intel_fbdev.h" > > #include "i915_active.h" > +#include "i915_buddy.h" > #include "i915_drv.h" > #include "i915_perf.h" > #include "i915_globals.h" > @@ -1295,6 +1296,7 @@ static const struct { > } init_funcs[] = { > { i915_check_nomodeset, NULL }, > { i915_active_module_init, i915_active_module_exit }, > + { i915_buddy_module_init, i915_buddy_module_exit }, > { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > { i915_pmu_init, i915_pmu_exit }, > -- > 2.32.0 >
Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit
On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin wrote: > > > On 23/07/2021 20:29, Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > > special and we can convert them over. > > > > I'm doing this split up into each patch because there's quite a bit of > > noise with removing the static global.slab_ce to just a > > slab_ce. > > > > Cc: Jason Ekstrand > > Signed-off-by: Daniel Vetter > > --- > > drivers/gpu/drm/i915/gt/intel_context.c | 25 - > > drivers/gpu/drm/i915/gt/intel_context.h | 3 +++ > > drivers/gpu/drm/i915/i915_globals.c | 2 -- > > drivers/gpu/drm/i915/i915_globals.h | 1 - > > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > > 5 files changed, 13 insertions(+), 20 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > b/drivers/gpu/drm/i915/gt/intel_context.c > > index baa05fddd690..283382549a6f 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > @@ -7,7 +7,6 @@ > > #include "gem/i915_gem_pm.h" > > > > #include "i915_drv.h" > > -#include "i915_globals.h" > > #include "i915_trace.h" > > > > #include "intel_context.h" > > @@ -15,14 +14,11 @@ > > #include "intel_engine_pm.h" > > #include "intel_ring.h" > > > > -static struct i915_global_context { > > - struct i915_global base; > > - struct kmem_cache *slab_ce; > > -} global; > > +struct kmem_cache *slab_ce; Static? With that, Reviewed-by: Jason Ekstrand > > > > static struct intel_context *intel_context_alloc(void) > > { > > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); > > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL); > > } > > > > static void rcu_context_free(struct rcu_head *rcu) > > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu) > > struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); > > > > trace_intel_context_free(ce); > > - kmem_cache_free(global.slab_ce, ce); > > + kmem_cache_free(slab_ce, ce); > > } > > > > void intel_context_free(struct intel_context *ce) > > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce) > > i915_active_fini(&ce->active); > > } > > > > -static void i915_global_context_exit(void) > > +void i915_context_module_exit(void) > > { > > - kmem_cache_destroy(global.slab_ce); > > + kmem_cache_destroy(slab_ce); > > } > > > > -static struct i915_global_context global = { { > > - .exit = i915_global_context_exit, > > -} }; > > - > > -int __init i915_global_context_init(void) > > +int __init i915_context_module_init(void) > > { > > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > > - if (!global.slab_ce) > > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > > + if (!slab_ce) > > return -ENOMEM; > > > > - i915_global_register(&global.base); > > return 0; > > } > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > > b/drivers/gpu/drm/i915/gt/intel_context.h > > index 974ef85320c2..a0ca82e3c40d 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.h > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce, > > struct intel_engine_cs *engine); > > void intel_context_fini(struct intel_context *ce); > > > > +void i915_context_module_exit(void); > > +int i915_context_module_init(void); > > + > > struct intel_context * > > intel_context_create(struct intel_engine_cs *engine); > > > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > > b/drivers/gpu/drm/i915/i915_globals.c > > index 3de7cf22ec76..d36eb7dc40aa 100644 > > --- a/drivers/gpu/drm/i915/i915_globals.c > > +++ b/drivers/gpu/drm/i915/i915_globals.c > > @@ -7,7 +7,6 @@ > > #include > > #include > > > > -#include "gem/i915_gem_context.h" > > #include "gem/i915_gem_object.h" > > #include "i915_globals.h" > > #include "i915_request.h" > > @@ -32,7 +31,6 @@ static void __i915_globals_cleanup(void) > > } > > > > static __initconst int (* const
Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)
On Mon, Jul 26, 2021 at 10:29 AM Matthew Auld wrote: > > On Mon, 26 Jul 2021 at 16:11, Jason Ekstrand wrote: > > > > On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld > > wrote: > > > > > > On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand wrote: > > > > > > > > This patch series fixes an issue with discrete graphics on Intel where > > > > we > > > > allowed dma-buf import while leaving the object in local memory. This > > > > breaks down pretty badly if the import happened on a different physical > > > > device. > > > > > > > > v7: > > > > - Drop "drm/i915/gem/ttm: Place new BOs in the requested region" > > > > - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in > > > > i915_gem_dumb_create()" > > > > - Misc. review feedback from Matthew Auld > > > > v8: > > > > - Misc. review feedback from Matthew Auld > > > > v9: > > > > - Replace the i915/ttm patch with two that are hopefully more correct > > > > > > > > Jason Ekstrand (6): > > > > drm/i915/gem: Check object_can_migrate from object_migrate > > > > drm/i915/gem: Refactor placement setup for i915_gem_object_create* > > > > (v2) > > > > drm/i915/gem: Call i915_gem_flush_free_objects() in > > > > i915_gem_dumb_create() > > > > drm/i915/gem: Unify user object creation (v3) > > > > drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed > > > > drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails > > > > > > > > Thomas Hellström (2): > > > > drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8) > > > > drm/i915/gem: Migrate to system at dma-buf attach time (v7) > > > > > > Should I push the series? > > > > Yes, please. Do we have a solid testing plan for things like this > > that touch discrete? I tested with mesa+glxgears on my DG1 but > > haven't run anything more stressful. > > I think all we really have are the migration related selftests, and CI > is not even running them on DG1 due to other breakage. Assuming you > ran these locally, I think we just merge the series? Works for me. Yes, I ran them on my TGL+DG1 box. I've also tested both GL and Vulkan PRIME support with the client running on DG1 and the compositor running on TGL with this series and everything works smooth. --Jason > > > > --Jason > > > > > > > > > > > > drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 > > > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 58 -- > > > > drivers/gpu/drm/i915/gem/i915_gem_object.c| 20 +- > > > > drivers/gpu/drm/i915/gem/i915_gem_object.h| 4 + > > > > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +- > > > > .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 190 +- > > > > .../drm/i915/gem/selftests/i915_gem_migrate.c | 15 -- > > > > 7 files changed, 341 insertions(+), 136 deletions(-) > > > > > > > > -- > > > > 2.31.1 > > > > > > > > ___ > > > > Intel-gfx mailing list > > > > intel-...@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [PATCH 05/10] drm/i915: move gem_context slab to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_luts to just a > slab_luts. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 25 +++-- > drivers/gpu/drm/i915/gem/i915_gem_context.h | 3 +++ > drivers/gpu/drm/i915/i915_globals.c | 2 -- > drivers/gpu/drm/i915/i915_globals.h | 1 - > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > 5 files changed, 13 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 89ca401bf9ae..c17c28af1e57 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -79,25 +79,21 @@ > #include "gt/intel_ring.h" > > #include "i915_gem_context.h" > -#include "i915_globals.h" > #include "i915_trace.h" > #include "i915_user_extensions.h" > > #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 > > -static struct i915_global_gem_context { > - struct i915_global base; > - struct kmem_cache *slab_luts; > -} global; > +struct kmem_cache *slab_luts; static. With that, Reviewed-by: Jason Ekstrand > struct i915_lut_handle *i915_lut_handle_alloc(void) > { > - return kmem_cache_alloc(global.slab_luts, GFP_KERNEL); > + return kmem_cache_alloc(slab_luts, GFP_KERNEL); > } > > void i915_lut_handle_free(struct i915_lut_handle *lut) > { > - return kmem_cache_free(global.slab_luts, lut); > + return kmem_cache_free(slab_luts, lut); > } > > static void lut_close(struct i915_gem_context *ctx) > @@ -2282,21 +2278,16 @@ i915_gem_engines_iter_next(struct > i915_gem_engines_iter *it) > #include "selftests/i915_gem_context.c" > #endif > > -static void i915_global_gem_context_exit(void) > +void i915_gem_context_module_exit(void) > { > - kmem_cache_destroy(global.slab_luts); > + kmem_cache_destroy(slab_luts); > } > > -static struct i915_global_gem_context global = { { > - .exit = i915_global_gem_context_exit, > -} }; > - > -int __init i915_global_gem_context_init(void) > +int __init i915_gem_context_module_init(void) > { > - global.slab_luts = KMEM_CACHE(i915_lut_handle, 0); > - if (!global.slab_luts) > + slab_luts = KMEM_CACHE(i915_lut_handle, 0); > + if (!slab_luts) > return -ENOMEM; > > - i915_global_register(&global.base); > return 0; > } > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h > b/drivers/gpu/drm/i915/gem/i915_gem_context.h > index 20411db84914..18060536b0c2 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h > @@ -224,6 +224,9 @@ i915_gem_engines_iter_next(struct i915_gem_engines_iter > *it); > for (i915_gem_engines_iter_init(&(it), (engines)); \ > ((ce) = i915_gem_engines_iter_next(&(it)));) > > +void i915_gem_context_module_exit(void); > +int i915_gem_context_module_init(void); > + > struct i915_lut_handle *i915_lut_handle_alloc(void); > void i915_lut_handle_free(struct i915_lut_handle *lut); > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index d36eb7dc40aa..dbb3d81eeea7 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -7,7 +7,6 @@ > #include > #include > > -#include "gem/i915_gem_object.h" > #include "i915_globals.h" > #include "i915_request.h" > #include "i915_scheduler.h" > @@ -31,7 +30,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_gem_context_init, > i915_global_objects_init, > i915_global_request_init, > i915_global_scheduler_init, > diff --git a/drivers/gpu/drm/i915/i915_globals.h > b/drivers/gpu/drm/i915/i915_globals.h > index 60daa738a188..f16752dbbdbf 100644 > --- a/drivers/gpu/drm/i915/i915_globals.h > +++ b/drivers/gpu/drm/i915/i915_globals.h > @@ -23,7 +23,6 @@ int i915_globals_init(void); > void i915_globals_exit(void); > > /* constructors */ > -int i915_global_gem_context_init(void); > int i915_global_objects_init(void); > int i915_global_request
Re: [PATCH 06/10] drm/i915: move gem_objects slab to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_objects to just a > slab_objects. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 26 +++--- > drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 +++ > drivers/gpu/drm/i915/i915_globals.c| 1 - > drivers/gpu/drm/i915/i915_globals.h| 1 - > drivers/gpu/drm/i915/i915_pci.c| 1 + > 5 files changed, 12 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c > b/drivers/gpu/drm/i915/gem/i915_gem_object.c > index 5c21cff33199..53156250d283 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -30,14 +30,10 @@ > #include "i915_gem_context.h" > #include "i915_gem_mman.h" > #include "i915_gem_object.h" > -#include "i915_globals.h" > #include "i915_memcpy.h" > #include "i915_trace.h" > > -static struct i915_global_object { > - struct i915_global base; > - struct kmem_cache *slab_objects; > -} global; > +struct kmem_cache *slab_objects; static With that, Reviewed-by: Jason Ekstrand > static const struct drm_gem_object_funcs i915_gem_object_funcs; > > @@ -45,7 +41,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void) > { > struct drm_i915_gem_object *obj; > > - obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL); > + obj = kmem_cache_zalloc(slab_objects, GFP_KERNEL); > if (!obj) > return NULL; > obj->base.funcs = &i915_gem_object_funcs; > @@ -55,7 +51,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void) > > void i915_gem_object_free(struct drm_i915_gem_object *obj) > { > - return kmem_cache_free(global.slab_objects, obj); > + return kmem_cache_free(slab_objects, obj); > } > > void i915_gem_object_init(struct drm_i915_gem_object *obj, > @@ -664,23 +660,17 @@ void i915_gem_init__objects(struct drm_i915_private > *i915) > INIT_WORK(&i915->mm.free_work, __i915_gem_free_work); > } > > -static void i915_global_objects_exit(void) > +void i915_objects_module_exit(void) > { > - kmem_cache_destroy(global.slab_objects); > + kmem_cache_destroy(slab_objects); > } > > -static struct i915_global_object global = { { > - .exit = i915_global_objects_exit, > -} }; > - > -int __init i915_global_objects_init(void) > +int __init i915_objects_module_init(void) > { > - global.slab_objects = > - KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN); > - if (!global.slab_objects) > + slab_objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN); > + if (!slab_objects) > return -ENOMEM; > > - i915_global_register(&global.base); > return 0; > } > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > index f3ede43282dc..6d8ea62a372f 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > @@ -48,6 +48,9 @@ static inline bool i915_gem_object_size_2big(u64 size) > > void i915_gem_init__objects(struct drm_i915_private *i915); > > +void i915_objects_module_exit(void); > +int i915_objects_module_init(void); > + > struct drm_i915_gem_object *i915_gem_object_alloc(void); > void i915_gem_object_free(struct drm_i915_gem_object *obj); > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index dbb3d81eeea7..40a592fbc3e0 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -30,7 +30,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_objects_init, > i915_global_request_init, > i915_global_scheduler_init, > i915_global_vma_init, > diff --git a/drivers/gpu/drm/i915/i915_globals.h > b/drivers/gpu/drm/i915/i915_globals.h > index f16752dbbdbf..9734740708f4 100644 > --- a/drivers/gpu/drm/i915/i915_globals.h > +++ b/drivers/gpu/drm/i915/i915_globals.h > @@ -23,7 +23,6 @@ int i915_globals_init(void); > void i915_globals_exit(void); > > /* constructors */ > -int i915_global_objects_init(void); > int i915_global_r
Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit
On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand wrote: > > On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin > wrote: > > > > > > On 23/07/2021 20:29, Daniel Vetter wrote: > > > With the global kmem_cache shrink infrastructure gone there's nothing > > > special and we can convert them over. > > > > > > I'm doing this split up into each patch because there's quite a bit of > > > noise with removing the static global.slab_ce to just a > > > slab_ce. > > > > > > Cc: Jason Ekstrand > > > Signed-off-by: Daniel Vetter > > > --- > > > drivers/gpu/drm/i915/gt/intel_context.c | 25 - > > > drivers/gpu/drm/i915/gt/intel_context.h | 3 +++ > > > drivers/gpu/drm/i915/i915_globals.c | 2 -- > > > drivers/gpu/drm/i915/i915_globals.h | 1 - > > > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > > > 5 files changed, 13 insertions(+), 20 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > > b/drivers/gpu/drm/i915/gt/intel_context.c > > > index baa05fddd690..283382549a6f 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > > @@ -7,7 +7,6 @@ > > > #include "gem/i915_gem_pm.h" > > > > > > #include "i915_drv.h" > > > -#include "i915_globals.h" > > > #include "i915_trace.h" > > > > > > #include "intel_context.h" > > > @@ -15,14 +14,11 @@ > > > #include "intel_engine_pm.h" > > > #include "intel_ring.h" > > > > > > -static struct i915_global_context { > > > - struct i915_global base; > > > - struct kmem_cache *slab_ce; > > > -} global; > > > +struct kmem_cache *slab_ce; > > Static? With that, > > Reviewed-by: Jason Ekstrand > > > > > > > static struct intel_context *intel_context_alloc(void) > > > { > > > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); > > > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL); > > > } > > > > > > static void rcu_context_free(struct rcu_head *rcu) > > > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu) > > > struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); > > > > > > trace_intel_context_free(ce); > > > - kmem_cache_free(global.slab_ce, ce); > > > + kmem_cache_free(slab_ce, ce); > > > } > > > > > > void intel_context_free(struct intel_context *ce) > > > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce) > > > i915_active_fini(&ce->active); > > > } > > > > > > -static void i915_global_context_exit(void) > > > +void i915_context_module_exit(void) > > > { > > > - kmem_cache_destroy(global.slab_ce); > > > + kmem_cache_destroy(slab_ce); > > > } > > > > > > -static struct i915_global_context global = { { > > > - .exit = i915_global_context_exit, > > > -} }; > > > - > > > -int __init i915_global_context_init(void) > > > +int __init i915_context_module_init(void) > > > { > > > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > > > - if (!global.slab_ce) > > > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > > > + if (!slab_ce) > > > return -ENOMEM; > > > > > > - i915_global_register(&global.base); > > > return 0; > > > } > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > > > b/drivers/gpu/drm/i915/gt/intel_context.h > > > index 974ef85320c2..a0ca82e3c40d 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_context.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > > > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce, > > > struct intel_engine_cs *engine); > > > void intel_context_fini(struct intel_context *ce); > > > > > > +void i915_context_module_exit(void); > > > +int i915_context_module_init(void); > > > + > > > struct intel_context * > > > intel_context_create(struct intel_engine_cs *engine); > > > > > > diff --git a/drivers/gpu/drm/i915/i915_
Re: [PATCH 07/10] drm/i915: move request slabs to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_requests|execute_cbs to just a > slab_requests|execute_cbs. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_globals.c | 2 -- > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > drivers/gpu/drm/i915/i915_request.c | 47 - > drivers/gpu/drm/i915/i915_request.h | 3 ++ > 4 files changed, 24 insertions(+), 30 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index 40a592fbc3e0..8fffa8d93bc5 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -8,7 +8,6 @@ > #include > > #include "i915_globals.h" > -#include "i915_request.h" > #include "i915_scheduler.h" > #include "i915_vma.h" > > @@ -30,7 +29,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_request_init, > i915_global_scheduler_init, > i915_global_vma_init, > }; > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 2334eb3e9abb..bb2bd12fb8c2 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -35,6 +35,7 @@ > #include "i915_drv.h" > #include "gem/i915_gem_context.h" > #include "gem/i915_gem_object.h" > +#include "i915_request.h" > #include "i915_perf.h" > #include "i915_globals.h" > #include "i915_selftest.h" > @@ -1302,6 +1303,7 @@ static const struct { > { i915_context_module_init, i915_context_module_exit }, > { i915_gem_context_module_init, i915_gem_context_module_exit }, > { i915_objects_module_init, i915_objects_module_exit }, > + { i915_request_module_init, i915_request_module_exit }, > { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > { i915_pmu_init, i915_pmu_exit }, > diff --git a/drivers/gpu/drm/i915/i915_request.c > b/drivers/gpu/drm/i915/i915_request.c > index 6594cb2f8ebd..69152369ea00 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -42,7 +42,6 @@ > > #include "i915_active.h" > #include "i915_drv.h" > -#include "i915_globals.h" > #include "i915_trace.h" > #include "intel_pm.h" > > @@ -52,11 +51,8 @@ struct execute_cb { > struct i915_request *signal; > }; > > -static struct i915_global_request { > - struct i915_global base; > - struct kmem_cache *slab_requests; > - struct kmem_cache *slab_execute_cbs; > -} global; > +struct kmem_cache *slab_requests; static > +struct kmem_cache *slab_execute_cbs; static Am I tired of typing this? Yes, I am! Will I keep typing it? Probably. :-P > > static const char *i915_fence_get_driver_name(struct dma_fence *fence) > { > @@ -107,7 +103,7 @@ static signed long i915_fence_wait(struct dma_fence > *fence, > > struct kmem_cache *i915_request_slab_cache(void) > { > - return global.slab_requests; > + return slab_requests; > } > > static void i915_fence_release(struct dma_fence *fence) > @@ -159,7 +155,7 @@ static void i915_fence_release(struct dma_fence *fence) > !cmpxchg(&rq->engine->request_pool, NULL, rq)) > return; > > - kmem_cache_free(global.slab_requests, rq); > + kmem_cache_free(slab_requests, rq); > } > > const struct dma_fence_ops i915_fence_ops = { > @@ -176,7 +172,7 @@ static void irq_execute_cb(struct irq_work *wrk) > struct execute_cb *cb = container_of(wrk, typeof(*cb), work); > > i915_sw_fence_complete(cb->fence); > - kmem_cache_free(global.slab_execute_cbs, cb); > + kmem_cache_free(slab_execute_cbs, cb); > } > > static __always_inline void > @@ -514,7 +510,7 @@ __await_execution(struct i915_request *rq, > if (i915_request_is_active(signal)) > return 0; > > - cb = kmem_cache_alloc(global.slab_execute_cbs, gfp); > + cb = kmem_cache_alloc(slab_execute_cbs, gfp); > if (!cb) > return -ENOMEM; > > @@ -868,7 +864,7 @@ request_alloc_slow(struct intel_timeline *tl, > rq = list_first_entry(&tl
Re: [PATCH 08/10] drm/i915: move scheduler slabs to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_dependencies|priorities to just a > slab_dependencies|priorities. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_globals.c | 2 -- > drivers/gpu/drm/i915/i915_globals.h | 2 -- > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > drivers/gpu/drm/i915/i915_scheduler.c | 39 +++ > drivers/gpu/drm/i915/i915_scheduler.h | 3 +++ > 5 files changed, 20 insertions(+), 28 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index 8fffa8d93bc5..8923589057ab 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -8,7 +8,6 @@ > #include > > #include "i915_globals.h" > -#include "i915_scheduler.h" > #include "i915_vma.h" > > static LIST_HEAD(globals); > @@ -29,7 +28,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_scheduler_init, > i915_global_vma_init, > }; > > diff --git a/drivers/gpu/drm/i915/i915_globals.h > b/drivers/gpu/drm/i915/i915_globals.h > index 9734740708f4..7a57bce1da05 100644 > --- a/drivers/gpu/drm/i915/i915_globals.h > +++ b/drivers/gpu/drm/i915/i915_globals.h > @@ -23,8 +23,6 @@ int i915_globals_init(void); > void i915_globals_exit(void); > > /* constructors */ > -int i915_global_request_init(void); > -int i915_global_scheduler_init(void); > int i915_global_vma_init(void); > > #endif /* _I915_GLOBALS_H_ */ > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index bb2bd12fb8c2..a44318519977 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -39,6 +39,7 @@ > #include "i915_perf.h" > #include "i915_globals.h" > #include "i915_selftest.h" > +#include "i915_scheduler.h" > > #define PLATFORM(x) .platform = (x) > #define GEN(x) \ > @@ -1304,6 +1305,7 @@ static const struct { > { i915_gem_context_module_init, i915_gem_context_module_exit }, > { i915_objects_module_init, i915_objects_module_exit }, > { i915_request_module_init, i915_request_module_exit }, > + { i915_scheduler_module_init, i915_scheduler_module_exit }, > { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > { i915_pmu_init, i915_pmu_exit }, > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c > b/drivers/gpu/drm/i915/i915_scheduler.c > index 561c649e59f7..02d90d239ff5 100644 > --- a/drivers/gpu/drm/i915/i915_scheduler.c > +++ b/drivers/gpu/drm/i915/i915_scheduler.c > @@ -7,15 +7,11 @@ > #include > > #include "i915_drv.h" > -#include "i915_globals.h" > #include "i915_request.h" > #include "i915_scheduler.h" > > -static struct i915_global_scheduler { > - struct i915_global base; > - struct kmem_cache *slab_dependencies; > - struct kmem_cache *slab_priorities; > -} global; > +struct kmem_cache *slab_dependencies; static > +struct kmem_cache *slab_priorities; static > > static DEFINE_SPINLOCK(schedule_lock); > > @@ -93,7 +89,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine > *sched_engine, int prio) > if (prio == I915_PRIORITY_NORMAL) { > p = &sched_engine->default_priolist; > } else { > - p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); > + p = kmem_cache_alloc(slab_priorities, GFP_ATOMIC); > /* Convert an allocation failure to a priority bump */ > if (unlikely(!p)) { > prio = I915_PRIORITY_NORMAL; /* recurses just once */ > @@ -122,7 +118,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine > *sched_engine, int prio) > > void __i915_priolist_free(struct i915_priolist *p) > { > - kmem_cache_free(global.slab_priorities, p); > + kmem_cache_free(slab_priorities, p); > } > > struct sched_cache { > @@ -313,13 +309,13 @@ void i915_sched_node_reinit(struct i915_sched_node > *node) > static struct i915_dependency * > i915_dependency_alloc(void) > { > - return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL); > + return kmem_cache_a
Re: [PATCH 09/10] drm/i915: move vma slab to direct module init/exit
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > With the global kmem_cache shrink infrastructure gone there's nothing > special and we can convert them over. > > I'm doing this split up into each patch because there's quite a bit of > noise with removing the static global.slab_vmas to just a > slab_vmas. > > We have to keep i915_drv.h include in i915_globals otherwise there's > nothing anymore that pulls in GEM_BUG_ON. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_globals.c | 3 +-- > drivers/gpu/drm/i915/i915_globals.h | 3 --- > drivers/gpu/drm/i915/i915_pci.c | 2 ++ > drivers/gpu/drm/i915/i915_vma.c | 25 - > drivers/gpu/drm/i915/i915_vma.h | 3 +++ > 5 files changed, 14 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > index 8923589057ab..04979789e7be 100644 > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ b/drivers/gpu/drm/i915/i915_globals.c > @@ -8,7 +8,7 @@ > #include > > #include "i915_globals.h" > -#include "i915_vma.h" > +#include "i915_drv.h" > > static LIST_HEAD(globals); > > @@ -28,7 +28,6 @@ static void __i915_globals_cleanup(void) > } > > static __initconst int (* const initfn[])(void) = { > - i915_global_vma_init, > }; > > int __init i915_globals_init(void) > diff --git a/drivers/gpu/drm/i915/i915_globals.h > b/drivers/gpu/drm/i915/i915_globals.h > index 7a57bce1da05..57d2998bba45 100644 > --- a/drivers/gpu/drm/i915/i915_globals.h > +++ b/drivers/gpu/drm/i915/i915_globals.h > @@ -22,7 +22,4 @@ void i915_global_register(struct i915_global *global); > int i915_globals_init(void); > void i915_globals_exit(void); > > -/* constructors */ > -int i915_global_vma_init(void); > - > #endif /* _I915_GLOBALS_H_ */ > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index a44318519977..0affcf33a211 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -40,6 +40,7 @@ > #include "i915_globals.h" > #include "i915_selftest.h" > #include "i915_scheduler.h" > +#include "i915_vma.h" > > #define PLATFORM(x) .platform = (x) > #define GEN(x) \ > @@ -1306,6 +1307,7 @@ static const struct { > { i915_objects_module_init, i915_objects_module_exit }, > { i915_request_module_init, i915_request_module_exit }, > { i915_scheduler_module_init, i915_scheduler_module_exit }, > + { i915_vma_module_init, i915_vma_module_exit }, > { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > { i915_pmu_init, i915_pmu_exit }, > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index 09a7c47926f7..d094e2016b93 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -34,24 +34,20 @@ > #include "gt/intel_gt_requests.h" > > #include "i915_drv.h" > -#include "i915_globals.h" > #include "i915_sw_fence_work.h" > #include "i915_trace.h" > #include "i915_vma.h" > > -static struct i915_global_vma { > - struct i915_global base; > - struct kmem_cache *slab_vmas; > -} global; > +struct kmem_cache *slab_vmas; static. With that, Reviewed-by: Jason Ekstrand > > struct i915_vma *i915_vma_alloc(void) > { > - return kmem_cache_zalloc(global.slab_vmas, GFP_KERNEL); > + return kmem_cache_zalloc(slab_vmas, GFP_KERNEL); > } > > void i915_vma_free(struct i915_vma *vma) > { > - return kmem_cache_free(global.slab_vmas, vma); > + return kmem_cache_free(slab_vmas, vma); > } > > #if IS_ENABLED(CONFIG_DRM_I915_ERRLOG_GEM) && IS_ENABLED(CONFIG_DRM_DEBUG_MM) > @@ -1414,21 +1410,16 @@ void i915_vma_make_purgeable(struct i915_vma *vma) > #include "selftests/i915_vma.c" > #endif > > -static void i915_global_vma_exit(void) > +void i915_vma_module_exit(void) > { > - kmem_cache_destroy(global.slab_vmas); > + kmem_cache_destroy(slab_vmas); > } > > -static struct i915_global_vma global = { { > - .exit = i915_global_vma_exit, > -} }; > - > -int __init i915_global_vma_init(void) > +int __init i915_vma_module_init(void) > { > - global.slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN); > - if (!global.slab_vmas) > + slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN); > + if (!slab_vmas) > return -ENOMEM; > > - i915_global_register(&global.base); > return 0; > } > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index eca452a9851f..ed69f66c7ab0 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -426,4 +426,7 @@ static inline int i915_vma_sync(struct i915_vma *vma) > return i915_active_wait(&vma->active); > } > > +void i915_vma_module_exit(void); > +int i915_vma_module_init(void); > + > #endif > -- > 2.32.0 >
Re: [PATCH 10/10] drm/i915: Remove i915_globals
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter wrote: > > No longer used. > > Cc: Jason Ekstrand > Signed-off-by: Daniel Vetter Reviewed-by: Jason Ekstrand But, also, tvrtko is right that dumping all that stuff in i915_pci.c isn't great. Mind typing a quick follow-on that moves i915_init/exit to i915_drv.c? --Jason > --- > drivers/gpu/drm/i915/Makefile | 1 - > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 1 - > drivers/gpu/drm/i915/i915_globals.c | 53 --- > drivers/gpu/drm/i915/i915_globals.h | 25 - > drivers/gpu/drm/i915/i915_pci.c | 2 - > 5 files changed, 82 deletions(-) > delete mode 100644 drivers/gpu/drm/i915/i915_globals.c > delete mode 100644 drivers/gpu/drm/i915/i915_globals.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 10b3bb6207ba..9022dc638ed6 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -166,7 +166,6 @@ i915-y += \ > i915_gem_gtt.o \ > i915_gem_ww.o \ > i915_gem.o \ > - i915_globals.o \ > i915_query.o \ > i915_request.o \ > i915_scheduler.o \ > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > index d86825437516..943c1d416ec0 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > @@ -6,7 +6,6 @@ > #include > > #include "i915_drv.h" > -#include "i915_globals.h" > #include "i915_params.h" > #include "intel_context.h" > #include "intel_engine_pm.h" > diff --git a/drivers/gpu/drm/i915/i915_globals.c > b/drivers/gpu/drm/i915/i915_globals.c > deleted file mode 100644 > index 04979789e7be.. > --- a/drivers/gpu/drm/i915/i915_globals.c > +++ /dev/null > @@ -1,53 +0,0 @@ > -/* > - * SPDX-License-Identifier: MIT > - * > - * Copyright © 2019 Intel Corporation > - */ > - > -#include > -#include > - > -#include "i915_globals.h" > -#include "i915_drv.h" > - > -static LIST_HEAD(globals); > - > -void __init i915_global_register(struct i915_global *global) > -{ > - GEM_BUG_ON(!global->exit); > - > - list_add_tail(&global->link, &globals); > -} > - > -static void __i915_globals_cleanup(void) > -{ > - struct i915_global *global, *next; > - > - list_for_each_entry_safe_reverse(global, next, &globals, link) > - global->exit(); > -} > - > -static __initconst int (* const initfn[])(void) = { > -}; > - > -int __init i915_globals_init(void) > -{ > - int i; > - > - for (i = 0; i < ARRAY_SIZE(initfn); i++) { > - int err; > - > - err = initfn[i](); > - if (err) { > - __i915_globals_cleanup(); > - return err; > - } > - } > - > - return 0; > -} > - > -void i915_globals_exit(void) > -{ > - __i915_globals_cleanup(); > -} > diff --git a/drivers/gpu/drm/i915/i915_globals.h > b/drivers/gpu/drm/i915/i915_globals.h > deleted file mode 100644 > index 57d2998bba45.. > --- a/drivers/gpu/drm/i915/i915_globals.h > +++ /dev/null > @@ -1,25 +0,0 @@ > -/* > - * SPDX-License-Identifier: MIT > - * > - * Copyright © 2019 Intel Corporation > - */ > - > -#ifndef _I915_GLOBALS_H_ > -#define _I915_GLOBALS_H_ > - > -#include > - > -typedef void (*i915_global_func_t)(void); > - > -struct i915_global { > - struct list_head link; > - > - i915_global_func_t exit; > -}; > - > -void i915_global_register(struct i915_global *global); > - > -int i915_globals_init(void); > -void i915_globals_exit(void); > - > -#endif /* _I915_GLOBALS_H_ */ > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 0affcf33a211..ed72bcb58331 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -37,7 +37,6 @@ > #include "gem/i915_gem_object.h" > #include "i915_request.h" > #include "i915_perf.h" > -#include "i915_globals.h" > #include "i915_selftest.h" > #include "i915_scheduler.h" > #include "i915_vma.h" > @@ -1308,7 +1307,6 @@ static const struct { > { i915_request_module_init, i915_request_module_exit }, > { i915_scheduler_module_init, i915_scheduler_module_exit }, > { i915_vma_module_init, i915_vma_module_exit }, > - { i915_globals_init, i915_globals_exit }, > { i915_mock_selftests, NULL }, > { i915_pmu_init, i915_pmu_exit }, > { i915_register_pci_driver, i915_unregister_pci_driver }, > -- > 2.32.0 >
Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit
On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin wrote: > On 26/07/2021 16:42, Jason Ekstrand wrote: > > On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand > > wrote: > >> > >> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin > >> wrote: > >>> > >>> > >>> On 23/07/2021 20:29, Daniel Vetter wrote: > >>>> With the global kmem_cache shrink infrastructure gone there's nothing > >>>> special and we can convert them over. > >>>> > >>>> I'm doing this split up into each patch because there's quite a bit of > >>>> noise with removing the static global.slab_ce to just a > >>>> slab_ce. > >>>> > >>>> Cc: Jason Ekstrand > >>>> Signed-off-by: Daniel Vetter > >>>> --- > >>>>drivers/gpu/drm/i915/gt/intel_context.c | 25 - > >>>>drivers/gpu/drm/i915/gt/intel_context.h | 3 +++ > >>>>drivers/gpu/drm/i915/i915_globals.c | 2 -- > >>>>drivers/gpu/drm/i915/i915_globals.h | 1 - > >>>>drivers/gpu/drm/i915/i915_pci.c | 2 ++ > >>>>5 files changed, 13 insertions(+), 20 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > >>>> b/drivers/gpu/drm/i915/gt/intel_context.c > >>>> index baa05fddd690..283382549a6f 100644 > >>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c > >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c > >>>> @@ -7,7 +7,6 @@ > >>>>#include "gem/i915_gem_pm.h" > >>>> > >>>>#include "i915_drv.h" > >>>> -#include "i915_globals.h" > >>>>#include "i915_trace.h" > >>>> > >>>>#include "intel_context.h" > >>>> @@ -15,14 +14,11 @@ > >>>>#include "intel_engine_pm.h" > >>>>#include "intel_ring.h" > >>>> > >>>> -static struct i915_global_context { > >>>> - struct i915_global base; > >>>> - struct kmem_cache *slab_ce; > >>>> -} global; > >>>> +struct kmem_cache *slab_ce; > >> > >> Static? With that, > >> > >> Reviewed-by: Jason Ekstrand > >> > >>>> > >>>>static struct intel_context *intel_context_alloc(void) > >>>>{ > >>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); > >>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL); > >>>>} > >>>> > >>>>static void rcu_context_free(struct rcu_head *rcu) > >>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu) > >>>>struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); > >>>> > >>>>trace_intel_context_free(ce); > >>>> - kmem_cache_free(global.slab_ce, ce); > >>>> + kmem_cache_free(slab_ce, ce); > >>>>} > >>>> > >>>>void intel_context_free(struct intel_context *ce) > >>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce) > >>>>i915_active_fini(&ce->active); > >>>>} > >>>> > >>>> -static void i915_global_context_exit(void) > >>>> +void i915_context_module_exit(void) > >>>>{ > >>>> - kmem_cache_destroy(global.slab_ce); > >>>> + kmem_cache_destroy(slab_ce); > >>>>} > >>>> > >>>> -static struct i915_global_context global = { { > >>>> - .exit = i915_global_context_exit, > >>>> -} }; > >>>> - > >>>> -int __init i915_global_context_init(void) > >>>> +int __init i915_context_module_init(void) > >>>>{ > >>>> - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > >>>> - if (!global.slab_ce) > >>>> + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); > >>>> + if (!slab_ce) > >>>>return -ENOMEM; > >>>> > >>>> - i915_global_register(&global.base); > >>>>return 0; > >>>>} > >>>> > >>>> diff --gi
Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit
On Mon, Jul 26, 2021 at 11:31 AM Tvrtko Ursulin wrote: > > > On 26/07/2021 17:20, Jason Ekstrand wrote: > > On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin > > wrote: > >> On 26/07/2021 16:42, Jason Ekstrand wrote: > >>> On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand > >>> wrote: > >>>> > >>>> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin > >>>> wrote: > >>>>> > >>>>> > >>>>> On 23/07/2021 20:29, Daniel Vetter wrote: > >>>>>> With the global kmem_cache shrink infrastructure gone there's nothing > >>>>>> special and we can convert them over. > >>>>>> > >>>>>> I'm doing this split up into each patch because there's quite a bit of > >>>>>> noise with removing the static global.slab_ce to just a > >>>>>> slab_ce. > >>>>>> > >>>>>> Cc: Jason Ekstrand > >>>>>> Signed-off-by: Daniel Vetter > >>>>>> --- > >>>>>> drivers/gpu/drm/i915/gt/intel_context.c | 25 > >>>>>> - > >>>>>> drivers/gpu/drm/i915/gt/intel_context.h | 3 +++ > >>>>>> drivers/gpu/drm/i915/i915_globals.c | 2 -- > >>>>>> drivers/gpu/drm/i915/i915_globals.h | 1 - > >>>>>> drivers/gpu/drm/i915/i915_pci.c | 2 ++ > >>>>>> 5 files changed, 13 insertions(+), 20 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > >>>>>> b/drivers/gpu/drm/i915/gt/intel_context.c > >>>>>> index baa05fddd690..283382549a6f 100644 > >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c > >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c > >>>>>> @@ -7,7 +7,6 @@ > >>>>>> #include "gem/i915_gem_pm.h" > >>>>>> > >>>>>> #include "i915_drv.h" > >>>>>> -#include "i915_globals.h" > >>>>>> #include "i915_trace.h" > >>>>>> > >>>>>> #include "intel_context.h" > >>>>>> @@ -15,14 +14,11 @@ > >>>>>> #include "intel_engine_pm.h" > >>>>>> #include "intel_ring.h" > >>>>>> > >>>>>> -static struct i915_global_context { > >>>>>> - struct i915_global base; > >>>>>> - struct kmem_cache *slab_ce; > >>>>>> -} global; > >>>>>> +struct kmem_cache *slab_ce; > >>>> > >>>> Static? With that, > >>>> > >>>> Reviewed-by: Jason Ekstrand > >>>> > >>>>>> > >>>>>> static struct intel_context *intel_context_alloc(void) > >>>>>> { > >>>>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); > >>>>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL); > >>>>>> } > >>>>>> > >>>>>> static void rcu_context_free(struct rcu_head *rcu) > >>>>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu) > >>>>>> struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); > >>>>>> > >>>>>> trace_intel_context_free(ce); > >>>>>> - kmem_cache_free(global.slab_ce, ce); > >>>>>> + kmem_cache_free(slab_ce, ce); > >>>>>> } > >>>>>> > >>>>>> void intel_context_free(struct intel_context *ce) > >>>>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce) > >>>>>> i915_active_fini(&ce->active); > >>>>>> } > >>>>>> > >>>>>> -static void i915_global_context_exit(void) > >>>>>> +void i915_context_module_exit(void) > >>>>>> { > >>>>>> - kmem_cache_destroy(global.slab_ce); > >>>>>> + kmem_cache_destroy(slab_ce); > >>>>>> } > >>>>>> > >>>>>
Re: [PATCH v2 11/11] drm/i915: Extract i915_module.c
On Tue, Jul 27, 2021 at 9:44 AM Tvrtko Ursulin wrote: > > > On 27/07/2021 13:10, Daniel Vetter wrote: > > The module init code is somewhat misplaced in i915_pci.c, since it > > needs to pull in init/exit functions from every part of the driver and > > pollutes the include list a lot. > > > > Extract an i915_module.c file which pulls all the bits together, and > > allows us to massively trim the include list of i915_pci.c. > > > > The downside is that have to drop the error path check Jason added to > > catch when we set up the pci driver too early. I think that risk is > > acceptable for this pretty nice include. > > i915_module.c is an improvement and the rest for me is not extremely > objectionable by the end of this incarnation, but I also do not see it > as an improvement really. It's not a big improvement to be sure, but I think there are a few ways this is nicer: 1. One less level of indirection to sort through. 2. The init/exit table is generally simpler than the i915_global interface. 3. It's easy to forget i915_global_register but forgetting to put an _exit function in the module init table is a lot more obvious. None of those are deal-breakers but they're kind-of nice. Anyway, this one is also Reviewed-by: Jason Ekstrand --Jason > There was a bug to fix relating to mock tests, but that is where the > exercise should have stopped for now. After that it IMHO spiraled out of > control, not least the unjustifiably expedited removal of cache > shrinking. On balance for me it is too churny and boils down to two > extremely capable people spending time on kind of really unimportant > side fiddles. And I do not intend to prescribe you what to do, just > expressing my bewilderment. FWIW... I can only say my opinion as it, not > that it matters a lot. > > Regards, > > Tvrtko > > > Cc: Jason Ekstrand > > Cc: Tvrtko Ursulin > > Signed-off-by: Daniel Vetter > > --- > > drivers/gpu/drm/i915/Makefile | 1 + > > drivers/gpu/drm/i915/i915_module.c | 113 > > drivers/gpu/drm/i915/i915_pci.c| 117 + > > drivers/gpu/drm/i915/i915_pci.h| 8 ++ > > 4 files changed, 125 insertions(+), 114 deletions(-) > > create mode 100644 drivers/gpu/drm/i915/i915_module.c > > create mode 100644 drivers/gpu/drm/i915/i915_pci.h > > > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > > index 9022dc638ed6..4ebd9f417ddb 100644 > > --- a/drivers/gpu/drm/i915/Makefile > > +++ b/drivers/gpu/drm/i915/Makefile > > @@ -38,6 +38,7 @@ i915-y += i915_drv.o \ > > i915_irq.o \ > > i915_getparam.o \ > > i915_mitigations.o \ > > + i915_module.o \ > > i915_params.o \ > > i915_pci.o \ > > i915_scatterlist.o \ > > diff --git a/drivers/gpu/drm/i915/i915_module.c > > b/drivers/gpu/drm/i915/i915_module.c > > new file mode 100644 > > index ..c578ea8f56a0 > > --- /dev/null > > +++ b/drivers/gpu/drm/i915/i915_module.c > > @@ -0,0 +1,113 @@ > > +/* > > + * SPDX-License-Identifier: MIT > > + * > > + * Copyright © 2021 Intel Corporation > > + */ > > + > > +#include > > + > > +#include "gem/i915_gem_context.h" > > +#include "gem/i915_gem_object.h" > > +#include "i915_active.h" > > +#include "i915_buddy.h" > > +#include "i915_params.h" > > +#include "i915_pci.h" > > +#include "i915_perf.h" > > +#include "i915_request.h" > > +#include "i915_scheduler.h" > > +#include "i915_selftest.h" > > +#include "i915_vma.h" > > + > > +static int i915_check_nomodeset(void) > > +{ > > + bool use_kms = true; > > + > > + /* > > + * Enable KMS by default, unless explicitly overriden by > > + * either the i915.modeset prarameter or by the > > + * vga_text_mode_force boot option. > > + */ > > + > > + if (i915_modparams.modeset == 0) > > + use_kms = false; > > + > > + if (vgacon_text_force() && i915_modparams.modeset == -1) > > + use_kms = false; > > + > > + if (!use_kms) { > > + /* Silently fail loading to not upset userspace. */ > > + DRM_DEBUG_DRIVER("KMS disabled.\n"); > > + return 1; > > + } > > + > > + return 0; > > +} > > + > > +static const struct { >
Re: [PATCH] drm/i915/selftests: prefer the create_user helper
On July 28, 2021 10:57:23 Matthew Auld wrote: No need to hand roll the set_placements stuff, now that that we have a helper for this. Also no need to handle the -ENODEV case here, since NULL mr implies missing device support, where the for_each_memory_region helper will always skip over such regions. Signed-off-by: Matthew Auld Cc: Jason Ekstrand Reviewed-by: Jason Ekstrand --- .../drm/i915/gem/selftests/i915_gem_mman.c| 46 ++- 1 file changed, 4 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 0b2b73d8a364..eed1c2c64e75 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -860,24 +860,6 @@ static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type) return !no_map; } -static void object_set_placements(struct drm_i915_gem_object *obj, - struct intel_memory_region **placements, - unsigned int n_placements) -{ - GEM_BUG_ON(!n_placements); - - if (n_placements == 1) { - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct intel_memory_region *mr = placements[0]; - - obj->mm.placements = &i915->mm.regions[mr->id]; - obj->mm.n_placements = 1; - } else { - obj->mm.placements = placements; - obj->mm.n_placements = n_placements; - } -} - #define expand32(x) (((x) << 0) | ((x) << 8) | ((x) << 16) | ((x) << 24)) static int __igt_mmap(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, @@ -972,15 +954,10 @@ static int igt_mmap(void *arg) struct drm_i915_gem_object *obj; int err; - obj = i915_gem_object_create_region(mr, sizes[i], 0, I915_BO_ALLOC_USER); - if (obj == ERR_PTR(-ENODEV)) - continue; - + obj = __i915_gem_object_create_user(i915, sizes[i], &mr, 1); if (IS_ERR(obj)) return PTR_ERR(obj); - object_set_placements(obj, &mr, 1); - err = __igt_mmap(i915, obj, I915_MMAP_TYPE_GTT); if (err == 0) err = __igt_mmap(i915, obj, I915_MMAP_TYPE_WC); @@ -1101,15 +1078,10 @@ static int igt_mmap_access(void *arg) struct drm_i915_gem_object *obj; int err; - obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER); - if (obj == ERR_PTR(-ENODEV)) - continue; - + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1); if (IS_ERR(obj)) return PTR_ERR(obj); - object_set_placements(obj, &mr, 1); - err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_GTT); if (err == 0) err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WB); @@ -1248,15 +1220,10 @@ static int igt_mmap_gpu(void *arg) struct drm_i915_gem_object *obj; int err; - obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER); - if (obj == ERR_PTR(-ENODEV)) - continue; - + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1); if (IS_ERR(obj)) return PTR_ERR(obj); - object_set_placements(obj, &mr, 1); - err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_GTT); if (err == 0) err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_WC); @@ -1405,15 +1372,10 @@ static int igt_mmap_revoke(void *arg) struct drm_i915_gem_object *obj; int err; - obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER); - if (obj == ERR_PTR(-ENODEV)) - continue; - + obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1); if (IS_ERR(obj)) return PTR_ERR(obj); - object_set_placements(obj, &mr, 1); - err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_GTT); if (err == 0) err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_WC); -- 2.26.3
Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation
On Tue, Aug 3, 2021 at 10:09 AM Daniel Vetter wrote: > On Wed, Jul 28, 2021 at 4:22 PM Matthew Auld > wrote: > > > > On Mon, 26 Jul 2021 at 17:10, Tvrtko Ursulin > > wrote: > > > > > > > > > On 26/07/2021 16:14, Jason Ekstrand wrote: > > > > On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst > > > > wrote: > > > >> > > > >> Op 23-07-2021 om 13:34 schreef Matthew Auld: > > > >>> From: Chris Wilson > > > >>> > > > >>> Jason Ekstrand requested a more efficient method than > > > >>> userptr+set-domain > > > >>> to determine if the userptr object was backed by a complete set of > > > >>> pages > > > >>> upon creation. To be more efficient than simply populating the userptr > > > >>> using get_user_pages() (as done by the call to set-domain or execbuf), > > > >>> we can walk the tree of vm_area_struct and check for gaps or vma not > > > >>> backed by struct page (VM_PFNMAP). The question is how to handle > > > >>> VM_MIXEDMAP which may be either struct page or pfn backed... > > > >>> > > > >>> With discrete we are going to drop support for set_domain(), so > > > >>> offering > > > >>> a way to probe the pages, without having to resort to dummy batches > > > >>> has > > > >>> been requested. > > > >>> > > > >>> v2: > > > >>> - add new query param for the PROBE flag, so userspace can easily > > > >>>check if the kernel supports it(Jason). > > > >>> - use mmap_read_{lock, unlock}. > > > >>> - add some kernel-doc. > > > >>> v3: > > > >>> - In the docs also mention that PROBE doesn't guarantee that the pages > > > >>>will remain valid by the time they are actually used(Tvrtko). > > > >>> - Add a small comment for the hole finding logic(Jason). > > > >>> - Move the param next to all the other params which just return true. > > > >>> > > > >>> Testcase: igt/gem_userptr_blits/probe > > > >>> Signed-off-by: Chris Wilson > > > >>> Signed-off-by: Matthew Auld > > > >>> Cc: Thomas Hellström > > > >>> Cc: Maarten Lankhorst > > > >>> Cc: Tvrtko Ursulin > > > >>> Cc: Jordan Justen > > > >>> Cc: Kenneth Graunke > > > >>> Cc: Jason Ekstrand > > > >>> Cc: Daniel Vetter > > > >>> Cc: Ramalingam C > > > >>> Reviewed-by: Tvrtko Ursulin > > > >>> Acked-by: Kenneth Graunke > > > >>> Reviewed-by: Jason Ekstrand > > > >>> --- > > > >>> drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 > > > >>> - > > > >>> drivers/gpu/drm/i915/i915_getparam.c| 1 + > > > >>> include/uapi/drm/i915_drm.h | 20 ++ > > > >>> 3 files changed, 61 insertions(+), 1 deletion(-) > > > >>> > > > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > >>> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > >>> index 56edfeff8c02..468a7a617fbf 100644 > > > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > > > >>> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops > > > >>> i915_gem_userptr_ops = { > > > >>> > > > >>> #endif > > > >>> > > > >>> +static int > > > >>> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long > > > >>> len) > > > >>> +{ > > > >>> + const unsigned long end = addr + len; > > > >>> + struct vm_area_struct *vma; > > > >>> + int ret = -EFAULT; > > > >>> + > > > >>> + mmap_read_lock(mm); > > > >>> + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { > > > >>> + /* Check for holes, note that we also update the addr > > > >>> below */ > > > >>> + if (vma->vm_start > addr) > >
Re: [PATCH 2/2] drm/i915: delete gpu reloc code
Both are Reviewed-by: Jason Ekstrand On Tue, Aug 3, 2021 at 7:49 AM Daniel Vetter wrote: > > It's already removed, this just garbage collects it all. > > v2: Rebase over s/GEN/GRAPHICS_VER/ > > v3: Also ditch eb.reloc_pool and eb.reloc_context (Maarten) > > Signed-off-by: Daniel Vetter > Cc: Jon Bloomfield > Cc: Chris Wilson > Cc: Maarten Lankhorst > Cc: Daniel Vetter > Cc: Joonas Lahtinen > Cc: "Thomas Hellström" > Cc: Matthew Auld > Cc: Lionel Landwerlin > Cc: Dave Airlie > Cc: Jason Ekstrand > --- > .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +- > .../drm/i915/selftests/i915_live_selftests.h | 1 - > 2 files changed, 1 insertion(+), 360 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > index e4dc4c3b4df3..98e25efffb59 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -277,16 +277,8 @@ struct i915_execbuffer { > bool has_llc : 1; > bool has_fence : 1; > bool needs_unfenced : 1; > - > - struct i915_request *rq; > - u32 *rq_cmd; > - unsigned int rq_size; > - struct intel_gt_buffer_pool_node *pool; > } reloc_cache; > > - struct intel_gt_buffer_pool_node *reloc_pool; /** relocation pool for > -EDEADLK handling */ > - struct intel_context *reloc_context; > - > u64 invalid_flags; /** Set of execobj.flags that are invalid */ > > u64 batch_len; /** Length of batch within object */ > @@ -1035,8 +1027,6 @@ static void eb_release_vmas(struct i915_execbuffer *eb, > bool final) > > static void eb_destroy(const struct i915_execbuffer *eb) > { > - GEM_BUG_ON(eb->reloc_cache.rq); > - > if (eb->lut_size > 0) > kfree(eb->buckets); > } > @@ -1048,14 +1038,6 @@ relocation_target(const struct > drm_i915_gem_relocation_entry *reloc, > return gen8_canonical_addr((int)reloc->delta + target->node.start); > } > > -static void reloc_cache_clear(struct reloc_cache *cache) > -{ > - cache->rq = NULL; > - cache->rq_cmd = NULL; > - cache->pool = NULL; > - cache->rq_size = 0; > -} > - > static void reloc_cache_init(struct reloc_cache *cache, > struct drm_i915_private *i915) > { > @@ -1068,7 +1050,6 @@ static void reloc_cache_init(struct reloc_cache *cache, > cache->has_fence = cache->graphics_ver < 4; > cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; > cache->node.flags = 0; > - reloc_cache_clear(cache); > } > > static inline void *unmask_page(unsigned long p) > @@ -1090,48 +1071,10 @@ static inline struct i915_ggtt *cache_to_ggtt(struct > reloc_cache *cache) > return &i915->ggtt; > } > > -static void reloc_cache_put_pool(struct i915_execbuffer *eb, struct > reloc_cache *cache) > -{ > - if (!cache->pool) > - return; > - > - /* > -* This is a bit nasty, normally we keep objects locked until the end > -* of execbuffer, but we already submit this, and have to unlock > before > -* dropping the reference. Fortunately we can only hold 1 pool node at > -* a time, so this should be harmless. > -*/ > - i915_gem_ww_unlock_single(cache->pool->obj); > - intel_gt_buffer_pool_put(cache->pool); > - cache->pool = NULL; > -} > - > -static void reloc_gpu_flush(struct i915_execbuffer *eb, struct reloc_cache > *cache) > -{ > - struct drm_i915_gem_object *obj = cache->rq->batch->obj; > - > - GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32)); > - cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END; > - > - i915_gem_object_flush_map(obj); > - i915_gem_object_unpin_map(obj); > - > - intel_gt_chipset_flush(cache->rq->engine->gt); > - > - i915_request_add(cache->rq); > - reloc_cache_put_pool(eb, cache); > - reloc_cache_clear(cache); > - > - eb->reloc_pool = NULL; > -} > - > static void reloc_cache_reset(struct reloc_cache *cache, struct > i915_execbuffer *eb) > { > void *vaddr; > > - if (cache->rq) > - reloc_gpu_flush(eb, cache); > - > if (!cache->vaddr) > return; > > @@ -1313,295 +1256,6 @@ static void clflush_write32(
Re: [Intel-gfx] [RFC PATCH v2] drm/doc/rfc: i915 DG1 uAPI
+mesa-dev and some Intel mesa people. On Wed, Apr 14, 2021 at 5:23 AM Daniel Vetter wrote: > > On Tue, Apr 13, 2021 at 12:47:06PM +0100, Matthew Auld wrote: > > Add an entry for the new uAPI needed for DG1. > > > > v2(Daniel): > > - include the overall upstreaming plan > > - add a note for mmap, there are differences here for TTM vs i915 > > - bunch of other suggestions from Daniel > > > > Signed-off-by: Matthew Auld > > Cc: Joonas Lahtinen > > Cc: Daniel Vetter > > Cc: Dave Airlie > > Bunch more thoughts below, I think we're getting there. Thanks for doing > this. > > > --- > > Documentation/gpu/rfc/i915_gem_lmem.h | 151 > > Documentation/gpu/rfc/i915_gem_lmem.rst | 119 +++ > > Documentation/gpu/rfc/index.rst | 4 + > > 3 files changed, 274 insertions(+) > > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > > > > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > > b/Documentation/gpu/rfc/i915_gem_lmem.h > > new file mode 100644 > > index ..6ae13209d7ef > > --- /dev/null > > +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > > @@ -0,0 +1,151 @@ > > +/* The new query_id for struct drm_i915_query_item */ > > +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > > + > > +/** > > + * enum drm_i915_gem_memory_class > > + */ > > +enum drm_i915_gem_memory_class { > > Are we really going with enum in uapi? I thought that was frought with > peril since the integer type of enum is quite a bit up to compilers. But > maybe I'm just scared. It looks to me like it's a __u16 below. That should be fine. We don't really need to give the enum type a name in that case, though. > > + /** @I915_MEMORY_CLASS_SYSTEM: system memory */ > > + I915_MEMORY_CLASS_SYSTEM = 0, > > + /** @I915_MEMORY_CLASS_DEVICE: device local-memory */ > > + I915_MEMORY_CLASS_DEVICE, > > +}; > > + > > +/** > > + * struct drm_i915_gem_memory_class_instance > > + */ > > +struct drm_i915_gem_memory_class_instance { > > + /** @memory_class: see enum drm_i915_gem_memory_class */ > > + __u16 memory_class; > > + > > + /** @memory_instance: which instance */ > > + __u16 memory_instance; > > +}; > > + > > +/** > > + * struct drm_i915_memory_region_info > > + * > > + * Describes one region as known to the driver. > > + */ > > +struct drm_i915_memory_region_info { > > + /** @region: class:instance pair encoding */ > > + struct drm_i915_gem_memory_class_instance region; > > + > > + /** @rsvd0: MBZ */ > > + __u32 rsvd0; > > + > > + /** @caps: MBZ */ > > + __u64 caps; > > + > > + /** @flags: MBZ */ > > + __u64 flags; > > + > > + /** @probed_size: Memory probed by the driver (-1 = unknown) */ > > + __u64 probed_size; > > + > > + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ > > + __u64 unallocated_size; > > + > > + /** @rsvd1: MBZ */ > > + __u64 rsvd1[8]; > > I guess this is for future stuff that becomes relevant with multi-tile? > Might be worth explaining in 1-2 words why we reserve a pile here. Also > it doesn't matter ofc for performance here :-) > > > +}; > > + > > +/** > > + * struct drm_i915_query_memory_regions > > + * > > + * Region info query enumerates all regions known to the driver by filling > > in > > + * an array of struct drm_i915_memory_region_info structures. > > I guess this works with the usual 1. query number of regions 2. get them > all two-step ioctl flow? Worth explaining here. > > > + */ > > +struct drm_i915_query_memory_regions { > > + /** @num_regions: Number of supported regions */ > > + __u32 num_regions; > > + > > + /** @rsvd: MBZ */ > > + __u32 rsvd[3]; > > + > > + /** @regions: Info about each supported region */ > > + struct drm_i915_memory_region_info regions[]; > > +}; > > Hm don't we need a query ioctl for this too? > > > + > > +#define DRM_I915_GEM_CREATE_EXT 0xdeadbeaf > > +#define DRM_IOCTL_I915_GEM_CREATE_EXTDRM_IOWR(DRM_COMMAND_BASE + > > DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext) > > + > > +/** > > + * struct drm_i915_gem_create_ext > > I think some explanation here that all new bo flags will be added here, > and that in general we're phasing out the various SET/GET ioctls. > > > + */ > > +struct drm_i915_gem_create_ext { > > + /** > > + * @size: Requested size for the object. > > + * > > + * The (page-aligned) allocated size for the object will be returned. > > + */ > > + __u64 size; > > + /** > > + * @handle: Returned handle for the object. > > + * > > + * Object handles are nonzero. > > + */ > > + __u32 handle; > > + /** @flags: MBZ */ > > + __u32 flags; > > + /** > > + * @extensions: > > + * For I915_GEM_CREATE_EXT_SETPARAM extension usage see both: > > + * struct drm_i915_gem_create_ext_setparam. > > + *
Re: [PATCH v3 4/4] drm/doc/rfc: i915 DG1 uAPI
On Thu, Apr 15, 2021 at 11:04 AM Matthew Auld wrote: > > Add an entry for the new uAPI needed for DG1. > > v2(Daniel): > - include the overall upstreaming plan > - add a note for mmap, there are differences here for TTM vs i915 > - bunch of other suggestions from Daniel > v3: > (Daniel) > - add a note for set/get caching stuff > - add some more docs for existing query and extensions stuff > - add an actual code example for regions query > - bunch of other stuff > (Jason) > - uAPI change(!): > - try a simpler design with the placements extension > - rather than have a generic setparam which can cover multiple > use cases, have each extension be responsible for one thing > only > > Signed-off-by: Matthew Auld > Cc: Joonas Lahtinen > Cc: Jordan Justen > Cc: Daniel Vetter > Cc: Kenneth Graunke > Cc: Jason Ekstrand > Cc: Dave Airlie > Cc: dri-devel@lists.freedesktop.org > Cc: mesa-...@lists.freedesktop.org > --- > Documentation/gpu/rfc/i915_gem_lmem.h | 255 > Documentation/gpu/rfc/i915_gem_lmem.rst | 139 + > Documentation/gpu/rfc/index.rst | 4 + > 3 files changed, 398 insertions(+) > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > b/Documentation/gpu/rfc/i915_gem_lmem.h > new file mode 100644 > index ..2a82a452e9f2 > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > @@ -0,0 +1,255 @@ > +/* > + * Note that drm_i915_query_item and drm_i915_query are existing bits of > uAPI. > + * For the regions query we are just adding a new query id, so no actual new > + * ioctl or anything, but including it here for reference. > + */ > +struct drm_i915_query_item { > +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > + > +__u64 query_id; > + > +/* > + * When set to zero by userspace, this is filled with the size of the > + * data to be written at the data_ptr pointer. The kernel sets this > + * value to a negative value to signal an error on a particular query > + * item. > + */ > +__s32 length; > + > +__u32 flags; > +/* > + * Data will be written at the location pointed by data_ptr when the > + * value of length matches the length of the data to be written by > the > + * kernel. > + */ > +__u64 data_ptr; > +}; > + > +struct drm_i915_query { > +__u32 num_items; > +/* > + * Unused for now. Must be cleared to zero. > + */ > +__u32 flags; > +/* > + * This points to an array of num_items drm_i915_query_item > structures. > + */ > +__u64 items_ptr; > +}; > + > +#define DRM_IOCTL_I915_QUERY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, > struct drm_i915_query) > + > +/** > + * enum drm_i915_gem_memory_class > + */ > +enum drm_i915_gem_memory_class { > + /** @I915_MEMORY_CLASS_SYSTEM: system memory */ > + I915_MEMORY_CLASS_SYSTEM = 0, > + /** @I915_MEMORY_CLASS_DEVICE: device local-memory */ > + I915_MEMORY_CLASS_DEVICE, > +}; > + > +/** > + * struct drm_i915_gem_memory_class_instance > + */ > +struct drm_i915_gem_memory_class_instance { > + /** @memory_class: see enum drm_i915_gem_memory_class */ > + __u16 memory_class; > + > + /** @memory_instance: which instance */ > + __u16 memory_instance; > +}; > + > +/** > + * struct drm_i915_memory_region_info > + * > + * Describes one region as known to the driver. > + * > + * Note that we reserve quite a lot of stuff here for potential future work. > As > + * an example we might want expose the capabilities(see caps) for a given > + * region, which could include things like if the region is CPU > + * mappable/accessible etc. I get caps but I'm seriously at a loss as to what the rest of this would be used for. Why are caps and flags both there and separate? Flags are typically something you set, not query. Also, what's with rsvd1 at the end? This smells of substantial over-building to me. I thought to myself, "maybe I'm missing a future use-case" so I looked at the internal tree and none of this is being used there either. This indicates to me that either I'm missing something and there's code somewhere I don't know about or, with three years of building on internal branches, we still haven't proven that any of this is needed. If it's the latter, which I strongly su
Re: [PATCH 3/4] drm/i915/uapi: convert i915_user_extension to kernel doc
On April 16, 2021 05:37:52 Matthew Auld wrote: Add some example usage for the extension chaining also, which is quite nifty. v2: (Daniel) - clarify that the name is just some integer, also document that the name space is not global Suggested-by: Daniel Vetter Signed-off-by: Matthew Auld Cc: Joonas Lahtinen Cc: Jordan Justen Cc: Daniel Vetter Cc: Kenneth Graunke Cc: Jason Ekstrand Cc: Dave Airlie Cc: dri-devel@lists.freedesktop.org Cc: mesa-...@lists.freedesktop.org Reviewed-by: Daniel Vetter --- include/uapi/drm/i915_drm.h | 54 ++--- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 92da48e935d1..d79b51c12ff2 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -62,8 +62,8 @@ extern "C" { #define I915_ERROR_UEVENT "ERROR" #define I915_RESET_UEVENT "RESET" -/* - * i915_user_extension: Base class for defining a chain of extensions +/** + * struct i915_user_extension - Base class for defining a chain of extensions * * Many interfaces need to grow over time. In most cases we can simply * extend the struct and have userspace pass in more data. Another option, @@ -76,12 +76,58 @@ extern "C" { * increasing complexity, and for large parts of that interface to be * entirely optional. The downside is more pointer chasing; chasing across * the __user boundary with pointers encapsulated inside u64. + * + * Example chaining: + * + * .. code-block:: C + * + * struct i915_user_extension ext3 { + * .next_extension = 0, // end + * .name = ..., + * }; + * struct i915_user_extension ext2 { + * .next_extension = (uintptr_t)&ext3, + * .name = ..., + * }; + * struct i915_user_extension ext1 { + * .next_extension = (uintptr_t)&ext2, + * .name = ..., + * }; + * + * Typically the i915_user_extension would be embedded in some uAPI struct, and + * in this case we would feed it the head of the chain(i.e ext1), which would + * then apply all of the above extensions. + * */ struct i915_user_extension { + /** + * @next_extension: + * + * Pointer to the next struct i915_user_extension, or zero if the end. + */ __u64 next_extension; + /** + * @name: Name of the extension. + * + * Note that the name here is just some integer. + * + * Also note that the name space for this is not global for the whole + * driver, but rather its scope/meaning is limited to the specific piece + * of uAPI which has embedded the struct i915_user_extension. We may want to rethink this decision. In Vulkan, we have several cases where we use the same extension multiple places. Given that the only extensible thing currently landed is context creation, we still could make this global. Then again, forcing us to go through the exercise of redefining every time has its advantages too. In any case, this is a correct description of the current state of affairs, so Reviewed-by: Jason Ekstrand + */ __u32 name; - __u32 flags; /* All undefined bits must be zero. */ - __u32 rsvd[4]; /* Reserved for future use; must be zero. */ + /** + * @flags: MBZ + * + * All undefined bits must be zero. + */ + __u32 flags; + /** + * @rsvd: MBZ + * + * Reserved for future use; must be zero. + */ + __u32 rsvd[4]; }; /* -- 2.26.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/4] drm/i915/uapi: convert i915_query and friend to kernel doc
Reviewed-by: Jason Ekstrand On April 16, 2021 05:37:55 Matthew Auld wrote: Add a note about the two-step process. v2(Tvrtko): - Also document the other method of just passing in a buffer which is large enough, which avoids two ioctl calls. Can make sense for smaller query items. Suggested-by: Daniel Vetter Signed-off-by: Matthew Auld Cc: Joonas Lahtinen Cc: Tvrtko Ursulin Cc: Jordan Justen Cc: Daniel Vetter Cc: Kenneth Graunke Cc: Jason Ekstrand Cc: Dave Airlie Cc: dri-devel@lists.freedesktop.org Cc: mesa-...@lists.freedesktop.org Reviewed-by: Daniel Vetter --- include/uapi/drm/i915_drm.h | 61 ++--- 1 file changed, 50 insertions(+), 11 deletions(-) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index d79b51c12ff2..12f375c52317 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2218,14 +2218,23 @@ struct drm_i915_perf_oa_config { __u64 flex_regs_ptr; }; +/** + * struct drm_i915_query_item - An individual query for the kernel to process. + * + * The behaviour is determined by the @query_id. Note that exactly what + * @data_ptr is also depends on the specific @query_id. + */ struct drm_i915_query_item { + /** @query_id: The id for this query */ __u64 query_id; #define DRM_I915_QUERY_TOPOLOGY_INFO1 #define DRM_I915_QUERY_ENGINE_INFO 2 #define DRM_I915_QUERY_PERF_CONFIG 3 /* Must be kept compact -- no holes and well documented */ - /* + /** +* @length: +* * When set to zero by userspace, this is filled with the size of the * data to be written at the data_ptr pointer. The kernel sets this * value to a negative value to signal an error on a particular query @@ -2233,21 +2242,26 @@ struct drm_i915_query_item { */ __s32 length; - /* + /** +* @flags: +* * When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0. * * When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the -* following : -* - DRM_I915_QUERY_PERF_CONFIG_LIST -* - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID -* - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID +* following: +* +* - DRM_I915_QUERY_PERF_CONFIG_LIST +* - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID +* - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID */ __u32 flags; #define DRM_I915_QUERY_PERF_CONFIG_LIST 1 #define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2 #define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID 3 - /* + /** +* @data_ptr: +* * Data will be written at the location pointed by data_ptr when the * value of length matches the length of the data to be written by the * kernel. @@ -2255,16 +2269,41 @@ struct drm_i915_query_item { __u64 data_ptr; }; +/** + * struct drm_i915_query - Supply an array of drm_i915_query_item for the kernel + * to fill out. + * + * Note that this is generally a two step process for each drm_i915_query_item + * in the array: + * + * 1. Call the DRM_IOCTL_I915_QUERY, giving it our array of drm_i915_query_item, + *with drm_i915_query_item.size set to zero. The kernel will then fill in + *the size, in bytes, which tells userspace how memory it needs to allocate + *for the blob(say for an array of properties). + * + * 2. Next we call DRM_IOCTL_I915_QUERY again, this time with the + *drm_i915_query_item.data_ptr equal to our newly allocated blob. Note that + *the i915_query_item.size should still be the same as what the kernel + *previously set. At this point the kernel can fill in the blob. + * + * Note that for some query items it can make sense for userspace to just pass + * in a buffer/blob equal to or larger than the required size. In this case only + * a single ioctl call is needed. For some smaller query items this can work + * quite well. + * + */ struct drm_i915_query { + /** @num_items: The number of elements in the @items_ptr array */ __u32 num_items; - /* -* Unused for now. Must be cleared to zero. + /** +* @flags: Unused for now. Must be cleared to zero. */ __u32 flags; - /* -* This points to an array of num_items drm_i915_query_item structures. + /** +* @items_ptr: This points to an array of num_items drm_i915_query_item +* structures. */ __u64 items_ptr; }; -- 2.26.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 4/4] drm/doc/rfc: i915 DG1 uAPI
On Mon, Apr 19, 2021 at 7:02 AM Matthew Auld wrote: > > On 16/04/2021 17:38, Jason Ekstrand wrote: > > On Thu, Apr 15, 2021 at 11:04 AM Matthew Auld > > wrote: > >> > >> Add an entry for the new uAPI needed for DG1. > >> > >> v2(Daniel): > >>- include the overall upstreaming plan > >>- add a note for mmap, there are differences here for TTM vs i915 > >>- bunch of other suggestions from Daniel > >> v3: > >> (Daniel) > >>- add a note for set/get caching stuff > >>- add some more docs for existing query and extensions stuff > >>- add an actual code example for regions query > >>- bunch of other stuff > >> (Jason) > >>- uAPI change(!): > >> - try a simpler design with the placements extension > >> - rather than have a generic setparam which can cover multiple > >>use cases, have each extension be responsible for one thing > >>only > >> > >> Signed-off-by: Matthew Auld > >> Cc: Joonas Lahtinen > >> Cc: Jordan Justen > >> Cc: Daniel Vetter > >> Cc: Kenneth Graunke > >> Cc: Jason Ekstrand > >> Cc: Dave Airlie > >> Cc: dri-devel@lists.freedesktop.org > >> Cc: mesa-...@lists.freedesktop.org > >> --- > >> Documentation/gpu/rfc/i915_gem_lmem.h | 255 > >> Documentation/gpu/rfc/i915_gem_lmem.rst | 139 + > >> Documentation/gpu/rfc/index.rst | 4 + > >> 3 files changed, 398 insertions(+) > >> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > >> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > >> > >> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > >> b/Documentation/gpu/rfc/i915_gem_lmem.h > >> new file mode 100644 > >> index ..2a82a452e9f2 > >> --- /dev/null > >> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > >> @@ -0,0 +1,255 @@ > >> +/* > >> + * Note that drm_i915_query_item and drm_i915_query are existing bits of > >> uAPI. > >> + * For the regions query we are just adding a new query id, so no actual > >> new > >> + * ioctl or anything, but including it here for reference. > >> + */ > >> +struct drm_i915_query_item { > >> +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > >> + > >> +__u64 query_id; > >> + > >> +/* > >> + * When set to zero by userspace, this is filled with the size of > >> the > >> + * data to be written at the data_ptr pointer. The kernel sets > >> this > >> + * value to a negative value to signal an error on a particular > >> query > >> + * item. > >> + */ > >> +__s32 length; > >> + > >> +__u32 flags; > >> +/* > >> + * Data will be written at the location pointed by data_ptr when > >> the > >> + * value of length matches the length of the data to be written > >> by the > >> + * kernel. > >> + */ > >> +__u64 data_ptr; > >> +}; > >> + > >> +struct drm_i915_query { > >> +__u32 num_items; > >> +/* > >> + * Unused for now. Must be cleared to zero. > >> + */ > >> +__u32 flags; > >> +/* > >> + * This points to an array of num_items drm_i915_query_item > >> structures. > >> + */ > >> +__u64 items_ptr; > >> +}; > >> + > >> +#define DRM_IOCTL_I915_QUERY DRM_IOWR(DRM_COMMAND_BASE + > >> DRM_I915_QUERY, struct drm_i915_query) > >> + > >> +/** > >> + * enum drm_i915_gem_memory_class > >> + */ > >> +enum drm_i915_gem_memory_class { > >> + /** @I915_MEMORY_CLASS_SYSTEM: system memory */ > >> + I915_MEMORY_CLASS_SYSTEM = 0, > >> + /** @I915_MEMORY_CLASS_DEVICE: device local-memory */ > >> + I915_MEMORY_CLASS_DEVICE, > >> +}; > >> + > >> +/** > >> + * struct drm_i915_gem_memory_class_instance > >> + */ > >> +struct drm_i915_gem_memory_class_instance { > >> + /** @memory_class: see enum drm_i915_gem_memory_class */ > >> + __u16 memory_class; > >> + > >> + /** @memory_inst
Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
Not going to comment on everything on the first pass... On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák wrote: > > Hi, > > This is our initial proposal for explicit fences everywhere and new memory > management that doesn't use BO fences. It's a redesign of how Linux graphics > drivers work, and it can coexist with what we have now. > > > 1. Introduction > (skip this if you are already sold on explicit fences) > > The current Linux graphics architecture was initially designed for GPUs with > only one graphics queue where everything was executed in the submission order > and per-BO fences were used for memory management and CPU-GPU > synchronization, not GPU-GPU synchronization. Later, multiple queues were > added on top, which required the introduction of implicit GPU-GPU > synchronization between queues of different processes using per-BO fences. > Recently, even parallel execution within one queue was enabled where a > command buffer starts draws and compute shaders, but doesn't wait for them, > enabling parallelism between back-to-back command buffers. Modesetting also > uses per-BO fences for scheduling flips. Our GPU scheduler was created to > enable all those use cases, and it's the only reason why the scheduler exists. > > The GPU scheduler, implicit synchronization, BO-fence-based memory > management, and the tracking of per-BO fences increase CPU overhead and > latency, and reduce parallelism. There is a desire to replace all of them > with something much simpler. Below is how we could do it. > > > 2. Explicit synchronization for window systems and modesetting > > The producer is an application and the consumer is a compositor or a > modesetting driver. > > 2.1. The Present request > > As part of the Present request, the producer will pass 2 fences (sync > objects) to the consumer alongside the presented DMABUF BO: > - The submit fence: Initially unsignalled, it will be signalled when the > producer has finished drawing into the presented buffer. > - The return fence: Initially unsignalled, it will be signalled when the > consumer has finished using the presented buffer. I'm not sure syncobj is what we want. In the Intel world we're trying to go even further to something we're calling "userspace fences" which are a timeline implemented as a single 64-bit value in some CPU-mappable BO. The client writes a higher value into the BO to signal the timeline. The kernel then provides some helpers for waiting on them reliably and without spinning. I don't expect everyone to support these right away but, If we're going to re-plumb userspace for explicit synchronization, I'd like to make sure we take this into account so we only have to do it once. > Deadlock mitigation to recover from segfaults: > - The kernel knows which process is obliged to signal which fence. This > information is part of the Present request and supplied by userspace. This isn't clear to me. Yes, if we're using anything dma-fence based like syncobj, this is true. But it doesn't seem totally true as a general statement. > - If the producer crashes, the kernel signals the submit fence, so that the > consumer can make forward progress. > - If the consumer crashes, the kernel signals the return fence, so that the > producer can reclaim the buffer. > - A GPU hang signals all fences. Other deadlocks will be handled like GPU > hangs. What do you mean by "all"? All fences that were supposed to be signaled by the hung context? > > Other window system requests can follow the same idea. > > Merged fences where one fence object contains multiple fences will be > supported. A merged fence is signalled only when its fences are signalled. > The consumer will have the option to redefine the unsignalled return fence to > a merged fence. > > 2.2. Modesetting > > Since a modesetting driver can also be the consumer, the present ioctl will > contain a submit fence and a return fence too. One small problem with this is > that userspace can hang the modesetting driver, but in theory, any later > present ioctl can override the previous one, so the unsignalled presentation > is never used. > > > 3. New memory management > > The per-BO fences will be removed and the kernel will not know which buffers > are busy. This will reduce CPU overhead and latency. The kernel will not need > per-BO fences with explicit synchronization, so we just need to remove their > last user: buffer evictions. It also resolves the current OOM deadlock. Is this even really possible? I'm no kernel MM expert (trying to learn some) but my understanding is that the use of per-BO dma-fence runs deep. I would like to stop using it for implicit synchronization to be sure, but I'm not sure I believe the claim that we can get rid of it entirely. Happy to see someone try, though. > 3.1. Evictions > > If the kernel wants to move a buffer, it will have to wait for everything to > go idle, halt all userspace command submissions, move the buffer,
Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
It's still early in the morning here and I'm not awake yet so sorry if this comes out in bits and pieces... On Tue, Apr 20, 2021 at 7:43 AM Daniel Stone wrote: > > Hi Marek, > > On Mon, 19 Apr 2021 at 11:48, Marek Olšák wrote: >> >> 2. Explicit synchronization for window systems and modesetting >> >> The producer is an application and the consumer is a compositor or a >> modesetting driver. >> >> 2.1. The Present request > > > So the 'present request' is an ioctl, right? Not a userspace construct like > it is today? If so, how do we correlate the two? > > The terminology is pretty X11-centric so I'll assume that's what you've > designed against, but Wayland and even X11 carry much more auxiliary > information attached to a present request than just 'this buffer, this > swapchain'. Wayland latches a lot of data on presentation, including > non-graphics data such as surface geometry (so we can have resizes which > don't suck), window state (e.g. fullscreen or not, also so we can have > resizes which don't suck), and these requests can also cascade through a tree > of subsurfaces (so we can have embeds which don't suck). X11 mostly just > carries timestamps, which is more tractable. > > Given we don't want to move the entirety of Wayland into kernel-visible > objects, how do we synchronise the two streams so they aren't incoherent? > Taking a rough stab at it whilst assuming we do have > DRM_IOCTL_NONMODE_PRESENT, this would create a present object somewhere in > kernel space, which the producer would create and ?? export a FD from, that > the compositor would ?? import. > >> As part of the Present request, the producer will pass 2 fences (sync >> objects) to the consumer alongside the presented DMABUF BO: >> - The submit fence: Initially unsignalled, it will be signalled when the >> producer has finished drawing into the presented buffer. > > > We have already have this in Wayland through dma_fence. I'm relaxed about > this becoming drm_syncobj or drm_newmappedysncobjthing, it's just a matter of > typing. X11 has patches to DRI3 to support dma_fence, but they never got > merged because it was far too invasive to a server which is no longer > maintained. > >> >> - The return fence: Initially unsignalled, it will be signalled when the >> consumer has finished using the presented buffer. > > > Currently in Wayland the return fence (again a dma_fence) is generated by the > compositor and sent as an event when it's done, because we can't have > speculative/empty/future fences. drm_syncobj would make this possible, but so > far I've been hesitant because I don't see the benefit to it (more below). > >> >> Deadlock mitigation to recover from segfaults: >> - The kernel knows which process is obliged to signal which fence. This >> information is part of the Present request and supplied by userspace. > > > Same as today with dma_fence. Less true with drm_syncobj if we're using > timelines. > >> >> - If the producer crashes, the kernel signals the submit fence, so that the >> consumer can make forward progress. > > > This is only a change if the producer is now allowed to submit a fence before > it's flushed the work which would eventually fulfill that fence. Using > dma_fence has so far isolated us from this. > >> >> - If the consumer crashes, the kernel signals the return fence, so that the >> producer can reclaim the buffer. > > > 'The consumer' is problematic, per below. I think the wording you want is 'if > no references are held to the submitted present object'. > >> >> - A GPU hang signals all fences. Other deadlocks will be handled like GPU >> hangs. >> >> Other window system requests can follow the same idea. > > > Which other window system requests did you have in mind? Again, moving the > entirety of Wayland's signaling into the kernel is a total non-starter. > Partly because it means our entire protocol would be subject to the kernel's > ABI rules, partly because the rules and interdependencies between the > requests are extremely complex, but mostly because the kernel is just a > useless proxy: it would be forced to do significant work to reason about what > those requests do and when they should happen, but wouldn't be able to make > those decisions itself so would have to just punt everything to userspace. > Unless we have eBPF compositors. > >> >> Merged fences where one fence object contains multiple fences will be >> supported. A merged fence is signalled only when its fences are signalled. >> The consumer will have the option to redefine the unsignalled return fence >> to a merged fence. > > > An elaboration of how this differed from drm_syncobj would be really helpful > here. I can make some guesses based on the rest of the mail, but I'm not sure > how accurate they are. > >> >> 2.2. Modesetting >> >> Since a modesetting driver can also be the consumer, the present ioctl will >> contain a submit fence and a return fence too. One small problem with th
Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
Sorry for the mega-reply but timezones... On Tue, Apr 20, 2021 at 6:59 AM Christian König wrote: > > > Yeah. If we go with userspace fences, then userspace can hang itself. Not > > the kernel's problem. > > Well, the path of inner peace begins with four words. “Not my fucking > problem.” 🧘 > But I'm not that much concerned about the kernel, but rather about > important userspace processes like X, Wayland, SurfaceFlinger etc... > > I mean attaching a page to a sync object and allowing to wait/signal > from both CPU as well as GPU side is not so much of a problem. Yup... Sorting out these issues is what makes this a hard problem. > > You have to somehow handle that, e.g. perhaps with conditional > > rendering and just using the old frame in compositing if the new one > > doesn't show up in time. > > Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level. "Just handle it with conditional rendering" is a pretty trite answer. If we have memory fences, we could expose a Vulkan extension to allow them to be read by conditional rendering or by a shader. However, as Daniel has pointed out multiple times, composition pipelines are long and complex and cheap tricks like that aren't something we can rely on for solving the problem. If we're going to solve the problem, we need to make driver-internal stuff nice while still providing something that looks very much like a sync_file with finite time semantics to the composition pipeline. How? That's the question. > Regards, > Christian. > > Am 20.04.21 um 13:16 schrieb Daniel Vetter: > > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote: > >> Daniel, are you suggesting that we should skip any deadlock prevention in > >> the kernel, and just let userspace wait for and signal any fence it has > >> access to? > > Yeah. If we go with userspace fences, then userspace can hang itself. Not > > the kernel's problem. The only criteria is that the kernel itself must > > never rely on these userspace fences, except for stuff like implementing > > optimized cpu waits. And in those we must always guarantee that the > > userspace process remains interruptible. > > > > It's a completely different world from dma_fence based kernel fences, > > whether those are implicit or explicit. > > > >> Do you have any concern with the deprecation/removal of BO fences in the > >> kernel assuming userspace is only using explicit fences? Any concern with > >> the submit and return fences for modesetting and other producer<->consumer > >> scenarios? > > Let me work on the full replay for your rfc first, because there's a lot > > of details here and nuance. > > -Daniel > > > >> Thanks, > >> Marek > >> > >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter wrote: > >> > >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König > >>> wrote: > >>>> Am 19.04.21 um 17:48 schrieb Jason Ekstrand: > >>>>> Not going to comment on everything on the first pass... > >>>>> > >>>>> On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák wrote: > >>>>>> Hi, > >>>>>> > >>>>>> This is our initial proposal for explicit fences everywhere and new > >>> memory management that doesn't use BO fences. It's a redesign of how Linux > >>> graphics drivers work, and it can coexist with what we have now. > >>>>>> > >>>>>> 1. Introduction > >>>>>> (skip this if you are already sold on explicit fences) > >>>>>> > >>>>>> The current Linux graphics architecture was initially designed for > >>> GPUs with only one graphics queue where everything was executed in the > >>> submission order and per-BO fences were used for memory management and > >>> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple > >>> queues were added on top, which required the introduction of implicit > >>> GPU-GPU synchronization between queues of different processes using per-BO > >>> fences. Recently, even parallel execution within one queue was enabled > >>> where a command buffer starts draws and compute shaders, but doesn't wait > >>> for them, enabling parallelism between back-to-back command buffers. > >>> Modesetting also uses per-BO fences for scheduling flips. Our GPU > >>> scheduler > >>> was created to enable all those use cases, and it's the only reason why
Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
On Tue, Apr 20, 2021 at 9:10 AM Daniel Vetter wrote: > > On Tue, Apr 20, 2021 at 1:59 PM Christian König > wrote: > > > > > Yeah. If we go with userspace fences, then userspace can hang itself. Not > > > the kernel's problem. > > > > Well, the path of inner peace begins with four words. “Not my fucking > > problem.” > > > > But I'm not that much concerned about the kernel, but rather about > > important userspace processes like X, Wayland, SurfaceFlinger etc... > > > > I mean attaching a page to a sync object and allowing to wait/signal > > from both CPU as well as GPU side is not so much of a problem. > > > > > You have to somehow handle that, e.g. perhaps with conditional > > > rendering and just using the old frame in compositing if the new one > > > doesn't show up in time. > > > > Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level. > > For opengl we do all the same guarantees, so if you get one of these > you just block until the fence is signalled. Doing that properly means > submit thread to support drm_syncobj like for vulkan. > > For vulkan we probably want to represent these as proper vk timeline > objects, and the vulkan way is to just let the application (well > compositor) here deal with it. If they import timelines from untrusted > other parties, they need to handle the potential fallback of being > lied at. How is "not vulkan's fucking problem", because that entire > "with great power (well performance) comes great responsibility" is > the entire vk design paradigm. The security aspects are currently an unsolved problem in Vulkan. The assumption is that everyone trusts everyone else to be careful with the scissors. It's a great model! I think we can do something in Vulkan to allow apps to protect themselves a bit but it's tricky and non-obvious. --Jason > Glamour will just rely on GL providing nice package of the harsh > reality of gpus, like usual. > > So I guess step 1 here for GL would be to provide some kind of > import/export of timeline syncobj, including properly handling this > "future/indefinite fences" aspect of them with submit thread and > everything. > > -Daniel > > > > > Regards, > > Christian. > > > > Am 20.04.21 um 13:16 schrieb Daniel Vetter: > > > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote: > > >> Daniel, are you suggesting that we should skip any deadlock prevention in > > >> the kernel, and just let userspace wait for and signal any fence it has > > >> access to? > > > Yeah. If we go with userspace fences, then userspace can hang itself. Not > > > the kernel's problem. The only criteria is that the kernel itself must > > > never rely on these userspace fences, except for stuff like implementing > > > optimized cpu waits. And in those we must always guarantee that the > > > userspace process remains interruptible. > > > > > > It's a completely different world from dma_fence based kernel fences, > > > whether those are implicit or explicit. > > > > > >> Do you have any concern with the deprecation/removal of BO fences in the > > >> kernel assuming userspace is only using explicit fences? Any concern with > > >> the submit and return fences for modesetting and other > > >> producer<->consumer > > >> scenarios? > > > Let me work on the full replay for your rfc first, because there's a lot > > > of details here and nuance. > > > -Daniel > > > > > >> Thanks, > > >> Marek > > >> > > >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter wrote: > > >> > > >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König > > >>> wrote: > > >>>> Am 19.04.21 um 17:48 schrieb Jason Ekstrand: > > >>>>> Not going to comment on everything on the first pass... > > >>>>> > > >>>>> On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>> This is our initial proposal for explicit fences everywhere and new > > >>> memory management that doesn't use BO fences. It's a redesign of how > > >>> Linux > > >>> graphics drivers work, and it can coexist with what we have now. > > >>>>>> > > >>>>>> 1. Introduction > > >>>>>> (skip this if you are
Re: [PATCH v3 4/4] drm/doc/rfc: i915 DG1 uAPI
On Tue, Apr 20, 2021 at 11:34 AM Tvrtko Ursulin wrote: > > > On 19/04/2021 16:19, Jason Ekstrand wrote: > > On Mon, Apr 19, 2021 at 7:02 AM Matthew Auld wrote: > >> > >> On 16/04/2021 17:38, Jason Ekstrand wrote: > >>> On Thu, Apr 15, 2021 at 11:04 AM Matthew Auld > >>> wrote: > >>>> > >>>> Add an entry for the new uAPI needed for DG1. > >>>> > >>>> v2(Daniel): > >>>> - include the overall upstreaming plan > >>>> - add a note for mmap, there are differences here for TTM vs i915 > >>>> - bunch of other suggestions from Daniel > >>>> v3: > >>>>(Daniel) > >>>> - add a note for set/get caching stuff > >>>> - add some more docs for existing query and extensions stuff > >>>> - add an actual code example for regions query > >>>> - bunch of other stuff > >>>>(Jason) > >>>> - uAPI change(!): > >>>> - try a simpler design with the placements extension > >>>> - rather than have a generic setparam which can cover multiple > >>>> use cases, have each extension be responsible for one thing > >>>> only > >>>> > >>>> Signed-off-by: Matthew Auld > >>>> Cc: Joonas Lahtinen > >>>> Cc: Jordan Justen > >>>> Cc: Daniel Vetter > >>>> Cc: Kenneth Graunke > >>>> Cc: Jason Ekstrand > >>>> Cc: Dave Airlie > >>>> Cc: dri-devel@lists.freedesktop.org > >>>> Cc: mesa-...@lists.freedesktop.org > >>>> --- > >>>>Documentation/gpu/rfc/i915_gem_lmem.h | 255 > >>>>Documentation/gpu/rfc/i915_gem_lmem.rst | 139 + > >>>>Documentation/gpu/rfc/index.rst | 4 + > >>>>3 files changed, 398 insertions(+) > >>>>create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > >>>>create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > >>>> > >>>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > >>>> b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>> new file mode 100644 > >>>> index ..2a82a452e9f2 > >>>> --- /dev/null > >>>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>> @@ -0,0 +1,255 @@ > >>>> +/* > >>>> + * Note that drm_i915_query_item and drm_i915_query are existing bits > >>>> of uAPI. > >>>> + * For the regions query we are just adding a new query id, so no > >>>> actual new > >>>> + * ioctl or anything, but including it here for reference. > >>>> + */ > >>>> +struct drm_i915_query_item { > >>>> +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > >>>> + > >>>> +__u64 query_id; > >>>> + > >>>> +/* > >>>> + * When set to zero by userspace, this is filled with the size > >>>> of the > >>>> + * data to be written at the data_ptr pointer. The kernel sets > >>>> this > >>>> + * value to a negative value to signal an error on a particular > >>>> query > >>>> + * item. > >>>> + */ > >>>> +__s32 length; > >>>> + > >>>> +__u32 flags; > >>>> +/* > >>>> + * Data will be written at the location pointed by data_ptr > >>>> when the > >>>> + * value of length matches the length of the data to be written > >>>> by the > >>>> + * kernel. > >>>> + */ > >>>> +__u64 data_ptr; > >>>> +}; > >>>> + > >>>> +struct drm_i915_query { > >>>> +__u32 num_items; > >>>> +/* > >>>> + * Unused for now. Must be cleared to zero. > >>>> + */ > >>>> +__u32 flags; > >>>> +/* > >>>> + * This points to an array of num_items drm_i915_query_item > >>>> structures. > >>>> + */ > >>>> +__u64 items_ptr; > >>>> +}; > >
Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal
On Tue, Apr 20, 2021 at 1:54 PM Daniel Vetter wrote: > > On Tue, Apr 20, 2021 at 7:45 PM Daniel Stone wrote: > > > And something more concrete: > > > > dma_fence. > > > > This already has all of the properties described above. Kernel-wise, it > > already devolves to CPU-side signaling when it crosses device boundaries. > > We need to support it roughly forever since it's been plumbed so far and so > > wide. Any primitive which is acceptable for winsys-like usage which crosses > > so many device/subsystem/process/security boundaries has to meet the same > > requirements. So why reinvent something which looks so similar, and has the > > same requirements of the kernel babysitting completion, providing little to > > no benefit for that difference? > > So I can mostly get behind this, except it's _not_ going to be > dma_fence. That thing has horrendous internal ordering constraints > within the kernel, and the one thing that doesn't allow you is to make > a dma_fence depend upon a userspace fence. Let me elaborate on this a bit. One of the problems I mentioned earlier is the conflation of fence types inside the kernel. dma_fence is used for solving two different semi-related but different problems: client command synchronization and memory residency synchronization. In the old implicit GL world, we conflated these two and thought we were providing ourselves a service. Not so much It's all well and good to say that we should turn the memory fence into a dma_fence and throw a timeout on it. However, these window-system sync primitives, as you said, have to be able to be shared across everything. In particular, we have to be able to share them with drivers that don't make a good separation between command and memory synchronization. Let's say we're rendering on ANV with memory fences and presenting on some USB display adapter whose kernel driver is a bit old-school. When we pass that fence to the other driver via a sync_file or similar, that driver may shove that dma_fence into the dma_resv on some buffer somewhere. Then our client, completely unaware of internal kernel dependencies, binds that buffer into its address space and kicks off another command buffer. So i915 throws in a dependency on that dma_resv which contains the previously created dma_fence and refuses to execute any more command buffers until it signals. Unfortunately, unbeknownst to i915, that command buffer which the client kicked off after doing that bind was required for signaling the memory fence on which our first dma_fence depends. Deadlock. Sure, we put a timeout on the dma_fence and it will eventually fire and unblock everything. However, there's one very important point that's easy to miss here: Neither i915 nor the client did anything wrong in the above scenario. The Vulkan footgun approach works because there are a set of rules and, if you follow those rules, you're guaranteed everything works. In the above scenario, however, the client followed all of the rules and got a deadlock anyway. We can't have that. > But what we can do is use the same currently existing container > objects like drm_syncobj or sync_file (timeline syncobj would fit best > tbh), and stuff a userspace fence behind it. The only trouble is that > currently timeline syncobj implement vulkan's spec, which means if you > build a wait-before-signal deadlock, you'll wait forever. Well until > the user ragequits and kills your process. Yeah, it may be that this approach can be made to work. Instead of reusing dma_fence, maybe we can reuse syncobj and have another form of syncobj which is a memory fence, a value to wait on, and a timeout. --Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 4/4] drm/doc/rfc: i915 DG1 uAPI
On Wed, Apr 21, 2021 at 3:22 AM Tvrtko Ursulin wrote: > > On 20/04/2021 18:00, Jason Ekstrand wrote: > > On Tue, Apr 20, 2021 at 11:34 AM Tvrtko Ursulin > > wrote: > >> > >> > >> On 19/04/2021 16:19, Jason Ekstrand wrote: > >>> On Mon, Apr 19, 2021 at 7:02 AM Matthew Auld > >>> wrote: > >>>> > >>>> On 16/04/2021 17:38, Jason Ekstrand wrote: > >>>>> On Thu, Apr 15, 2021 at 11:04 AM Matthew Auld > >>>>> wrote: > >>>>>> > >>>>>> Add an entry for the new uAPI needed for DG1. > >>>>>> > >>>>>> v2(Daniel): > >>>>>> - include the overall upstreaming plan > >>>>>> - add a note for mmap, there are differences here for TTM vs i915 > >>>>>> - bunch of other suggestions from Daniel > >>>>>> v3: > >>>>>> (Daniel) > >>>>>> - add a note for set/get caching stuff > >>>>>> - add some more docs for existing query and extensions stuff > >>>>>> - add an actual code example for regions query > >>>>>> - bunch of other stuff > >>>>>> (Jason) > >>>>>> - uAPI change(!): > >>>>>>- try a simpler design with the placements extension > >>>>>>- rather than have a generic setparam which can cover > >>>>>> multiple > >>>>>> use cases, have each extension be responsible for one > >>>>>> thing > >>>>>> only > >>>>>> > >>>>>> Signed-off-by: Matthew Auld > >>>>>> Cc: Joonas Lahtinen > >>>>>> Cc: Jordan Justen > >>>>>> Cc: Daniel Vetter > >>>>>> Cc: Kenneth Graunke > >>>>>> Cc: Jason Ekstrand > >>>>>> Cc: Dave Airlie > >>>>>> Cc: dri-devel@lists.freedesktop.org > >>>>>> Cc: mesa-...@lists.freedesktop.org > >>>>>> --- > >>>>>> Documentation/gpu/rfc/i915_gem_lmem.h | 255 > >>>>>> > >>>>>> Documentation/gpu/rfc/i915_gem_lmem.rst | 139 + > >>>>>> Documentation/gpu/rfc/index.rst | 4 + > >>>>>> 3 files changed, 398 insertions(+) > >>>>>> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > >>>>>> > >>>>>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>> b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>> new file mode 100644 > >>>>>> index ..2a82a452e9f2 > >>>>>> --- /dev/null > >>>>>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>> @@ -0,0 +1,255 @@ > >>>>>> +/* > >>>>>> + * Note that drm_i915_query_item and drm_i915_query are existing bits > >>>>>> of uAPI. > >>>>>> + * For the regions query we are just adding a new query id, so no > >>>>>> actual new > >>>>>> + * ioctl or anything, but including it here for reference. > >>>>>> + */ > >>>>>> +struct drm_i915_query_item { > >>>>>> +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > >>>>>> + > >>>>>> +__u64 query_id; > >>>>>> + > >>>>>> +/* > >>>>>> + * When set to zero by userspace, this is filled with the > >>>>>> size of the > >>>>>> + * data to be written at the data_ptr pointer. The kernel > >>>>>> sets this > >>>>>> + * value to a negative value to signal an error on a > >>>>>> particular query > >>>>>> + * item. > >>>>>> + */ > >>>>>> +__s32 length; > >>>>>> + > >>>>>> +__u32 flags; > >>>>>> +/* > >>>>>> + * Data will be written at the location pointed by data_ptr &g
Re: [PATCH v3 4/4] drm/doc/rfc: i915 DG1 uAPI
On Wed, Apr 21, 2021 at 9:25 AM Tvrtko Ursulin wrote: > > > On 21/04/2021 14:54, Jason Ekstrand wrote: > > On Wed, Apr 21, 2021 at 3:22 AM Tvrtko Ursulin > > wrote: > >> > >> On 20/04/2021 18:00, Jason Ekstrand wrote: > >>> On Tue, Apr 20, 2021 at 11:34 AM Tvrtko Ursulin > >>> wrote: > >>>> > >>>> > >>>> On 19/04/2021 16:19, Jason Ekstrand wrote: > >>>>> On Mon, Apr 19, 2021 at 7:02 AM Matthew Auld > >>>>> wrote: > >>>>>> > >>>>>> On 16/04/2021 17:38, Jason Ekstrand wrote: > >>>>>>> On Thu, Apr 15, 2021 at 11:04 AM Matthew Auld > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Add an entry for the new uAPI needed for DG1. > >>>>>>>> > >>>>>>>> v2(Daniel): > >>>>>>>> - include the overall upstreaming plan > >>>>>>>> - add a note for mmap, there are differences here for TTM vs > >>>>>>>> i915 > >>>>>>>> - bunch of other suggestions from Daniel > >>>>>>>> v3: > >>>>>>>> (Daniel) > >>>>>>>> - add a note for set/get caching stuff > >>>>>>>> - add some more docs for existing query and extensions stuff > >>>>>>>> - add an actual code example for regions query > >>>>>>>> - bunch of other stuff > >>>>>>>> (Jason) > >>>>>>>> - uAPI change(!): > >>>>>>>> - try a simpler design with the placements extension > >>>>>>>> - rather than have a generic setparam which can cover > >>>>>>>> multiple > >>>>>>>> use cases, have each extension be responsible for one > >>>>>>>> thing > >>>>>>>> only > >>>>>>>> > >>>>>>>> Signed-off-by: Matthew Auld > >>>>>>>> Cc: Joonas Lahtinen > >>>>>>>> Cc: Jordan Justen > >>>>>>>> Cc: Daniel Vetter > >>>>>>>> Cc: Kenneth Graunke > >>>>>>>> Cc: Jason Ekstrand > >>>>>>>> Cc: Dave Airlie > >>>>>>>> Cc: dri-devel@lists.freedesktop.org > >>>>>>>> Cc: mesa-...@lists.freedesktop.org > >>>>>>>> --- > >>>>>>>> Documentation/gpu/rfc/i915_gem_lmem.h | 255 > >>>>>>>> > >>>>>>>> Documentation/gpu/rfc/i915_gem_lmem.rst | 139 + > >>>>>>>> Documentation/gpu/rfc/index.rst | 4 + > >>>>>>>> 3 files changed, 398 insertions(+) > >>>>>>>> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>>>> create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > >>>>>>>> > >>>>>>>> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>>>> b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>>>> new file mode 100644 > >>>>>>>> index ..2a82a452e9f2 > >>>>>>>> --- /dev/null > >>>>>>>> +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > >>>>>>>> @@ -0,0 +1,255 @@ > >>>>>>>> +/* > >>>>>>>> + * Note that drm_i915_query_item and drm_i915_query are existing > >>>>>>>> bits of uAPI. > >>>>>>>> + * For the regions query we are just adding a new query id, so no > >>>>>>>> actual new > >>>>>>>> + * ioctl or anything, but including it here for reference. > >>>>>>>> + */ > >>>>>>>> +struct drm_i915_query_item { > >>>>>>>> +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > >>>>>>>> + > >>>>>>>> +__u64 query_id; > >>>>>>>> + > >>>>>>>> +
[PATCH 01/21] drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE
This reverts commit 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction"). This API was originally added for OpenCL but the compute-runtime PR has sat open for a year without action so we can still pull it out if we want. I argue we should drop it for three reasons: 1. If the compute-runtime PR has sat open for a year, this clearly isn't that important. 2. It's a very leaky API. Ring size is an implementation detail of the current execlist scheduler and really only makes sense there. It can't apply to the older ring-buffer scheduler on pre-execlist hardware because that's shared across all contexts and it won't apply to the GuC scheduler that's in the pipeline. 3. Having userspace set a ring size in bytes is a bad solution to the problem of having too small a ring. There is no way that userspace has the information to know how to properly set the ring size so it's just going to detect the feature and always set it to the maximum of 512K. This is what the compute-runtime PR does. The scheduler in i915, on the other hand, does have the information to make an informed choice. It could detect if the ring size is a problem and grow it itself. Or, if that's too hard, we could just increase the default size from 16K to 32K or even 64K instead of relying on userspace to do it. Let's drop this API for now and, if someone decides they really care about solving this problem, they can do it properly. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/Makefile | 1 - drivers/gpu/drm/i915/gem/i915_gem_context.c | 85 +-- drivers/gpu/drm/i915/gt/intel_context_param.c | 63 -- drivers/gpu/drm/i915/gt/intel_context_param.h | 3 - include/uapi/drm/i915_drm.h | 20 + 5 files changed, 4 insertions(+), 168 deletions(-) delete mode 100644 drivers/gpu/drm/i915/gt/intel_context_param.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index d0d936d9137bc..afa22338fa343 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -88,7 +88,6 @@ gt-y += \ gt/gen8_ppgtt.o \ gt/intel_breadcrumbs.o \ gt/intel_context.o \ - gt/intel_context_param.o \ gt/intel_context_sseu.o \ gt/intel_engine_cs.o \ gt/intel_engine_heartbeat.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index fd8ee52e17a47..e52b85b8f923d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1335,63 +1335,6 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, return err; } -static int __apply_ringsize(struct intel_context *ce, void *sz) -{ - return intel_context_set_ring_size(ce, (unsigned long)sz); -} - -static int set_ringsize(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) - return -ENODEV; - - if (args->size) - return -EINVAL; - - if (!IS_ALIGNED(args->value, I915_GTT_PAGE_SIZE)) - return -EINVAL; - - if (args->value < I915_GTT_PAGE_SIZE) - return -EINVAL; - - if (args->value > 128 * I915_GTT_PAGE_SIZE) - return -EINVAL; - - return context_apply_all(ctx, -__apply_ringsize, -__intel_context_ring_size(args->value)); -} - -static int __get_ringsize(struct intel_context *ce, void *arg) -{ - long sz; - - sz = intel_context_get_ring_size(ce); - GEM_BUG_ON(sz > INT_MAX); - - return sz; /* stop on first engine */ -} - -static int get_ringsize(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - int sz; - - if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) - return -ENODEV; - - if (args->size) - return -EINVAL; - - sz = context_apply_all(ctx, __get_ringsize, NULL); - if (sz < 0) - return sz; - - args->value = sz; - return 0; -} - int i915_gem_user_to_context_sseu(struct intel_gt *gt, const struct drm_i915_gem_context_param_sseu *user, @@ -2037,11 +1980,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_persistence(ctx, args); break; - case I915_CONTEXT_PARAM_RINGSIZE: - ret = set_ringsize(ctx, args); - break; - case I915_CONTEXT_PARAM_BAN_PERIOD: + case I915_CONTEXT_PARAM_RINGSIZE: default: ret = -EINVAL; break; @@ -2069,18 +2009,6 @@ static int create_setparam(struct i915_user_exte
[PATCH 00/21] drm/i915/gem: ioctl clean-ups
Overview: - This patch series attempts to clean up some of the IOCTL mess we've created over the last few years. The most egregious bit being context mutability. In summary, this series: 1. Drops two never-used context params: RINGSIZE and NO_ZEROMAP 2. Drops the entire CONTEXT_CLONE API 3. Implements SINGLE_TIMELINE with a syncobj instead of actually sharing intel_timeline between engines. 4. Adds a few sanity restrictions to the balancing/bonding API. 5. Implements a proto-ctx mechanism so that the engine set and VM can only be set early on in the lifetime of a context, before anything ever executes on it. This effectively makes the VM and engine set immutable. This series has been tested with IGT as well as the Iris, ANV, and the Intel media driver doing an 8K decode (this uses bonding/balancing). I've also done quite a bit of git archeology to ensure that nothing in here will break anything that's already shipped at some point in history. It's possible I've missed something, but I've dug quite a bit. Details and motivation: --- In very broad strokes, there's an effort going on right now within Intel to try and clean up and simplify i915 anywhere we can. We obviously don't want to break any shipping userspace but, as can be seen by this series, there's a lot i915 theoretically supports which userspace doesn't actually need. Some of this, like the two context params used here, were simply oversights where we went through the usual API review process and merged the i915 bits but the userspace bits never landed for some reason. Not all are so innocent, however. For instance, there's an entire context cloning API which allows one to create a context with certain parameters "cloned" from some other context. This entire API has never been used by any userspace except IGT and there were never patches to any other userspace to use it. It never should have landed. Also, when we added support for setting explicit engine sets and sharing VMs across contexts, people decided to do so via SET_CONTEXT_PARAM. While this allowed them to re-use existing API, it did so at the cost of making those states mutable which leads to a plethora of potential race conditions. There were even IGT tests merged to cover some of theses: - gem_vm_create@async-destroy and gem_vm_create@destroy-race which test swapping out the VM on a running context. - gem_ctx_persistence@replace* which test whether a client can escape a non-persistent context by submitting a hanging batch and then swapping out the engine set before the hang is detected. - api_intel_bb@bb-with-vm which tests the that intel_bb_assign_vm works properly. This API is never used by any other IGT test. There is also an entire deferred flush and set state framework in i915_gem_cotnext.c which exists for safely swapping out the VM while there is work in-flight on a context. So, clearly people knew that this API was inherently racy and difficult to implement but they landed it anyway. Why? The best explanation I've been given is because it makes the API more "unified" or "symmetric" for this stuff to go through SET_CONTEXT_PARAM. It's not because any userspace actually wants to be able to swap out the VM or the set of engines on a running context. That would be utterly insane. This patch series cleans up this particular mess by introducing the concept of a i915_gem_proto_context data structure which contains context creation information. When you initially call GEM_CONTEXT_CREATE, a proto-context in created instead of an actual context. Then, the first time something is done on the context besides SET_CONTEXT_PARAM, an actual context is created. This allows us to keep the old drivers which use SET_CONTEXT_PARAM to set up the engine set (see also media) while ensuring that, once you have an i915_gem_context, the VM and the engine set are immutable state. Eventually, there are more clean-ups I'd like to do on top of this which should make working with contexts inside i915 simpler and safer: 1. Move the GEM handle -> vma LUT from i915_gem_context into either i915_ppgtt or drm_i915_file_private depending on whether or not the hardware has a full PPGTT. 2. Move the delayed context destruction code into intel_context or a per-engine wrapper struct rather than i915_gem_context. 3. Get rid of the separation between context close and context destroy 4. Get rid of the RCU on i915_gem_context However, these should probably be done as a separate patch series as this one is already starting to get longish, especially if you consider the 89 IGT patches that go along with it. Test-with: 20210423214853.876911-1-ja...@jlekstrand.net Jason Ekstrand (21): drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP drm/i915/gem: Set the watchdog
[PATCH 02/21] drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP
The idea behind this param is to support OpenCL drivers with relocations because OpenCL reserves 0x0 for NULL and, if we placed memory there, it would confuse CL kernels. It was originally sent out as part of a patch series including libdrm [1] and Beignet [2] support. However, the libdrm and Beignet patches never landed in their respective upstream projects so this API has never been used. It's never been used in Mesa or any other driver, either. Dropping this API allows us to delete a small bit of code. [1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html [2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 16 ++-- .../gpu/drm/i915/gem/i915_gem_context_types.h| 1 - drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 include/uapi/drm/i915_drm.h | 4 4 files changed, 6 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index e52b85b8f923d..35bcdeddfbf3f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1922,15 +1922,6 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, int ret = 0; switch (args->param) { - case I915_CONTEXT_PARAM_NO_ZEROMAP: - if (args->size) - ret = -EINVAL; - else if (args->value) - set_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - else - clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - break; - case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE: if (args->size) ret = -EINVAL; @@ -1980,6 +1971,7 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_persistence(ctx, args); break; + case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: case I915_CONTEXT_PARAM_RINGSIZE: default: @@ -2360,11 +2352,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, return -ENOENT; switch (args->param) { - case I915_CONTEXT_PARAM_NO_ZEROMAP: - args->size = 0; - args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - break; - case I915_CONTEXT_PARAM_GTT_SIZE: args->size = 0; rcu_read_lock(); @@ -2412,6 +2399,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, args->value = i915_gem_context_is_persistent(ctx); break; + case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: case I915_CONTEXT_PARAM_RINGSIZE: default: diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 340473aa70de0..5ae71ec936f7c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -129,7 +129,6 @@ struct i915_gem_context { * @user_flags: small set of booleans controlled by the user */ unsigned long user_flags; -#define UCONTEXT_NO_ZEROMAP0 #define UCONTEXT_NO_ERROR_CAPTURE 1 #define UCONTEXT_BANNABLE 2 #define UCONTEXT_RECOVERABLE 3 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 297143511f99b..b812f313422a9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -290,7 +290,6 @@ struct i915_execbuffer { struct intel_context *reloc_context; u64 invalid_flags; /** Set of execobj.flags that are invalid */ - u32 context_flags; /** Set of execobj.flags to insert from the ctx */ u64 batch_len; /** Length of batch within object */ u32 batch_start_offset; /** Location within object of batch */ @@ -541,9 +540,6 @@ eb_validate_vma(struct i915_execbuffer *eb, entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } - if (!(entry->flags & EXEC_OBJECT_PINNED)) - entry->flags |= eb->context_flags; - return 0; } @@ -750,10 +746,6 @@ static int eb_select_context(struct i915_execbuffer *eb) if (rcu_access_pointer(ctx->vm)) eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT; - eb->context_flags = 0; - if (test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags)) - eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS; - return 0; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index
[PATCH 03/21] drm/i915/gem: Set the watchdog timeout directly in intel_context_set_gem
Instead of handling it like a context param, unconditionally set it when intel_contexts are created. This doesn't fix anything but does simplify the code a bit. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 43 +++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 4 -- drivers/gpu/drm/i915/gt/intel_context_param.h | 3 +- 3 files changed, 6 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 35bcdeddfbf3f..1091cc04a242a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -233,7 +233,11 @@ static void intel_context_set_gem(struct intel_context *ce, intel_engine_has_timeslices(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); - intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us); + if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) && + ctx->i915->params.request_timeout_ms) { + unsigned int timeout_ms = ctx->i915->params.request_timeout_ms; + intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000); + } } static void __free_engines(struct i915_gem_engines *e, unsigned int count) @@ -792,41 +796,6 @@ static void __assign_timeline(struct i915_gem_context *ctx, context_apply_all(ctx, __apply_timeline, timeline); } -static int __apply_watchdog(struct intel_context *ce, void *timeout_us) -{ - return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us); -} - -static int -__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us) -{ - int ret; - - ret = context_apply_all(ctx, __apply_watchdog, - (void *)(uintptr_t)timeout_us); - if (!ret) - ctx->watchdog.timeout_us = timeout_us; - - return ret; -} - -static void __set_default_fence_expiry(struct i915_gem_context *ctx) -{ - struct drm_i915_private *i915 = ctx->i915; - int ret; - - if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) || - !i915->params.request_timeout_ms) - return; - - /* Default expiry for user fences. */ - ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000); - if (ret) - drm_notice(&i915->drm, - "Failed to configure default fence expiry! (%d)", - ret); -} - static struct i915_gem_context * i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) { @@ -871,8 +840,6 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) intel_timeline_put(timeline); } - __set_default_fence_expiry(ctx); - trace_i915_context_create(ctx); return ctx; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 5ae71ec936f7c..676592e27e7d2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -153,10 +153,6 @@ struct i915_gem_context { */ atomic_t active_count; - struct { - u64 timeout_us; - } watchdog; - /** * @hang_timestamp: The last time(s) this context caused a GPU hang */ diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h index dffedd983693d..0c69cb42d075c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_param.h +++ b/drivers/gpu/drm/i915/gt/intel_context_param.h @@ -10,11 +10,10 @@ #include "intel_context.h" -static inline int +static inline void intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us) { ce->watchdog.timeout_us = timeout_us; - return 0; } #endif /* INTEL_CONTEXT_PARAM_H */ -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/21] drm/i915/gem: Return void from context_apply_all
None of the callbacks we use with it return an error code anymore; they all return 0 unconditionally. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 26 +++-- 1 file changed, 8 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 1091cc04a242a..8a77855123cec 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -718,32 +718,25 @@ __context_engines_await(const struct i915_gem_context *ctx, return engines; } -static int +static void context_apply_all(struct i915_gem_context *ctx, - int (*fn)(struct intel_context *ce, void *data), + void (*fn)(struct intel_context *ce, void *data), void *data) { struct i915_gem_engines_iter it; struct i915_gem_engines *e; struct intel_context *ce; - int err = 0; e = __context_engines_await(ctx, NULL); - for_each_gem_engine(ce, e, it) { - err = fn(ce, data); - if (err) - break; - } + for_each_gem_engine(ce, e, it) + fn(ce, data); i915_sw_fence_complete(&e->fence); - - return err; } -static int __apply_ppgtt(struct intel_context *ce, void *vm) +static void __apply_ppgtt(struct intel_context *ce, void *vm) { i915_vm_put(ce->vm); ce->vm = i915_vm_get(vm); - return 0; } static struct i915_address_space * @@ -783,10 +776,9 @@ static void __set_timeline(struct intel_timeline **dst, intel_timeline_put(old); } -static int __apply_timeline(struct intel_context *ce, void *timeline) +static void __apply_timeline(struct intel_context *ce, void *timeline) { __set_timeline(&ce->timeline, timeline); - return 0; } static void __assign_timeline(struct i915_gem_context *ctx, @@ -1842,19 +1834,17 @@ set_persistence(struct i915_gem_context *ctx, return __context_set_persistence(ctx, args->value); } -static int __apply_priority(struct intel_context *ce, void *arg) +static void __apply_priority(struct intel_context *ce, void *arg) { struct i915_gem_context *ctx = arg; if (!intel_engine_has_timeslices(ce->engine)) - return 0; + return; if (ctx->sched.priority >= I915_PRIORITY_NORMAL) intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce); - - return 0; } static int set_priority(struct i915_gem_context *ctx, -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 06/21] drm/i915: Implement SINGLE_TIMELINE with a syncobj (v3)
This API is entirely unnecessary and I'd love to get rid of it. If userspace wants a single timeline across multiple contexts, they can either use implicit synchronization or a syncobj, both of which existed at the time this feature landed. The justification given at the time was that it would help GL drivers which are inherently single-timeline. However, neither of our GL drivers actually wanted the feature. i965 was already in maintenance mode at the time and iris uses syncobj for everything. Unfortunately, as much as I'd love to get rid of it, it is used by the media driver so we can't do that. We can, however, do the next-best thing which is to embed a syncobj in the context and do exactly what we'd expect from userspace internally. This isn't an entirely identical implementation because it's no longer atomic if userspace races with itself by calling execbuffer2 twice simultaneously from different threads. It won't crash in that case; it just doesn't guarantee any ordering between those two submits. Moving SINGLE_TIMELINE to a syncobj emulation has a couple of technical advantages beyond mere annoyance. One is that intel_timeline is no longer an api-visible object and can remain entirely an implementation detail. This may be advantageous as we make scheduler changes going forward. Second is that, together with deleting the CLONE_CONTEXT API, we should now have a 1:1 mapping between intel_context and intel_timeline which may help us reduce locking. v2 (Jason Ekstrand): - Update the comment on i915_gem_context::syncobj to mention that it's an emulation and the possible race if userspace calls execbuffer2 twice on the same context concurrently. - Wrap the checks for eb.gem_context->syncobj in unlikely() - Drop the dma_fence reference - Improved commit message v3 (Jason Ekstrand): - Move the dma_fence_put() to before the error exit Signed-off-by: Jason Ekstrand Cc: Maarten Lankhorst Cc: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 49 +-- .../gpu/drm/i915/gem/i915_gem_context_types.h | 14 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 16 ++ 3 files changed, 40 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 2c2fefa912805..a72c9b256723b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -67,6 +67,8 @@ #include #include +#include + #include "gt/gen6_ppgtt.h" #include "gt/intel_context.h" #include "gt/intel_context_param.h" @@ -225,10 +227,6 @@ static void intel_context_set_gem(struct intel_context *ce, ce->vm = vm; } - GEM_BUG_ON(ce->timeline); - if (ctx->timeline) - ce->timeline = intel_timeline_get(ctx->timeline); - if (ctx->sched.priority >= I915_PRIORITY_NORMAL && intel_engine_has_timeslices(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); @@ -351,9 +349,6 @@ void i915_gem_context_release(struct kref *ref) mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); - if (ctx->timeline) - intel_timeline_put(ctx->timeline); - put_pid(ctx->pid); mutex_destroy(&ctx->mutex); @@ -570,6 +565,9 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm); + if (ctx->syncobj) + drm_syncobj_put(ctx->syncobj); + ctx->file_priv = ERR_PTR(-EBADF); /* @@ -765,33 +763,11 @@ static void __assign_ppgtt(struct i915_gem_context *ctx, i915_vm_close(vm); } -static void __set_timeline(struct intel_timeline **dst, - struct intel_timeline *src) -{ - struct intel_timeline *old = *dst; - - *dst = src ? intel_timeline_get(src) : NULL; - - if (old) - intel_timeline_put(old); -} - -static void __apply_timeline(struct intel_context *ce, void *timeline) -{ - __set_timeline(&ce->timeline, timeline); -} - -static void __assign_timeline(struct i915_gem_context *ctx, - struct intel_timeline *timeline) -{ - __set_timeline(&ctx->timeline, timeline); - context_apply_all(ctx, __apply_timeline, timeline); -} - static struct i915_gem_context * i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) { struct i915_gem_context *ctx; + int ret; if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE && !HAS_EXECLISTS(i915)) @@ -820,16 +796,13 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) } if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) { -
[PATCH 07/21] drm/i915: Drop getparam support for I915_CONTEXT_PARAM_ENGINES
This has never been used by any userspace except IGT and provides no real functionality beyond parroting back parameters userspace passed in as part of context creation or via setparam. If the context is in legacy mode (where you use I915_EXEC_RENDER and friends), it returns success with zero data so it's not useful for discovering what engines are in the context. It's also not a replacement for the recently removed I915_CONTEXT_CLONE_ENGINES because it doesn't return any of the balancing or bonding information. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 77 + 1 file changed, 1 insertion(+), 76 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index a72c9b256723b..e8179918fa306 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1725,78 +1725,6 @@ set_engines(struct i915_gem_context *ctx, return 0; } -static int -get_engines(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - struct i915_context_param_engines __user *user; - struct i915_gem_engines *e; - size_t n, count, size; - bool user_engines; - int err = 0; - - e = __context_engines_await(ctx, &user_engines); - if (!e) - return -ENOENT; - - if (!user_engines) { - i915_sw_fence_complete(&e->fence); - args->size = 0; - return 0; - } - - count = e->num_engines; - - /* Be paranoid in case we have an impedance mismatch */ - if (!check_struct_size(user, engines, count, &size)) { - err = -EINVAL; - goto err_free; - } - if (overflows_type(size, args->size)) { - err = -EINVAL; - goto err_free; - } - - if (!args->size) { - args->size = size; - goto err_free; - } - - if (args->size < size) { - err = -EINVAL; - goto err_free; - } - - user = u64_to_user_ptr(args->value); - if (put_user(0, &user->extensions)) { - err = -EFAULT; - goto err_free; - } - - for (n = 0; n < count; n++) { - struct i915_engine_class_instance ci = { - .engine_class = I915_ENGINE_CLASS_INVALID, - .engine_instance = I915_ENGINE_CLASS_INVALID_NONE, - }; - - if (e->engines[n]) { - ci.engine_class = e->engines[n]->engine->uabi_class; - ci.engine_instance = e->engines[n]->engine->uabi_instance; - } - - if (copy_to_user(&user->engines[n], &ci, sizeof(ci))) { - err = -EFAULT; - goto err_free; - } - } - - args->size = size; - -err_free: - i915_sw_fence_complete(&e->fence); - return err; -} - static int set_persistence(struct i915_gem_context *ctx, const struct drm_i915_gem_context_param *args) @@ -2127,10 +2055,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, ret = get_ppgtt(file_priv, ctx, args); break; - case I915_CONTEXT_PARAM_ENGINES: - ret = get_engines(ctx, args); - break; - case I915_CONTEXT_PARAM_PERSISTENCE: args->size = 0; args->value = i915_gem_context_is_persistent(ctx); @@ -2138,6 +2062,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: + case I915_CONTEXT_PARAM_ENGINES: case I915_CONTEXT_PARAM_RINGSIZE: default: ret = -EINVAL; -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/21] drm/i915: Drop the CONTEXT_CLONE API
This API allows one context to grab bits out of another context upon creation. It can be used as a short-cut for setparam(getparam()) for things like I915_CONTEXT_PARAM_VM. However, it's never been used by any real userspace. It's used by a few IGT tests and that's it. Since it doesn't add any real value (most of the stuff you can CLONE you can copy in other ways), drop it. There is one thing that this API allows you to clone which you cannot clone via getparam/setparam: timelines. However, timelines are an implementation detail of i915 and not really something that needs to be exposed to userspace. Also, sharing timelines between contexts isn't obviously useful and supporting it has the potential to complicate i915 internally. It also doesn't add any functionality that the client can't get in other ways. If a client really wants a shared timeline, they can use a syncobj and set it as an in and out fence on every submit. Signed-off-by: Jason Ekstrand Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 199 +--- include/uapi/drm/i915_drm.h | 16 +- 2 files changed, 6 insertions(+), 209 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 8a77855123cec..2c2fefa912805 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1958,207 +1958,14 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data) return ctx_setparam(arg->fpriv, arg->ctx, &local.param); } -static int clone_engines(struct i915_gem_context *dst, -struct i915_gem_context *src) +static int invalid_ext(struct i915_user_extension __user *ext, void *data) { - struct i915_gem_engines *clone, *e; - bool user_engines; - unsigned long n; - - e = __context_engines_await(src, &user_engines); - if (!e) - return -ENOENT; - - clone = alloc_engines(e->num_engines); - if (!clone) - goto err_unlock; - - for (n = 0; n < e->num_engines; n++) { - struct intel_engine_cs *engine; - - if (!e->engines[n]) { - clone->engines[n] = NULL; - continue; - } - engine = e->engines[n]->engine; - - /* -* Virtual engines are singletons; they can only exist -* inside a single context, because they embed their -* HW context... As each virtual context implies a single -* timeline (each engine can only dequeue a single request -* at any time), it would be surprising for two contexts -* to use the same engine. So let's create a copy of -* the virtual engine instead. -*/ - if (intel_engine_is_virtual(engine)) - clone->engines[n] = - intel_execlists_clone_virtual(engine); - else - clone->engines[n] = intel_context_create(engine); - if (IS_ERR_OR_NULL(clone->engines[n])) { - __free_engines(clone, n); - goto err_unlock; - } - - intel_context_set_gem(clone->engines[n], dst); - } - clone->num_engines = n; - i915_sw_fence_complete(&e->fence); - - /* Serialised by constructor */ - engines_idle_release(dst, rcu_replace_pointer(dst->engines, clone, 1)); - if (user_engines) - i915_gem_context_set_user_engines(dst); - else - i915_gem_context_clear_user_engines(dst); - return 0; - -err_unlock: - i915_sw_fence_complete(&e->fence); - return -ENOMEM; -} - -static int clone_flags(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - dst->user_flags = src->user_flags; - return 0; -} - -static int clone_schedattr(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - dst->sched = src->sched; - return 0; -} - -static int clone_sseu(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - struct i915_gem_engines *e = i915_gem_context_lock_engines(src); - struct i915_gem_engines *clone; - unsigned long n; - int err; - - /* no locking required; sole access under constructor*/ - clone = __context_engines_static(dst); - if (e->num_engines != clone->num_engines) { - err = -EINVAL; - goto unlock; - } - - for (n = 0; n < e->num_engines; n++) { - struct intel_context *ce = e->engines[n]; - - if (clone->engines[n]->
[PATCH 08/21] drm/i915/gem: Disallow bonding of virtual engines
This adds a bunch of complexity which the media driver has never actually used. The media driver does technically bond a balanced engine to another engine but the balanced engine only has one engine in the sibling set. This doesn't actually result in a virtual engine. Unless some userspace badly wants it, there's no good reason to support this case. This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op. We leave the validation code in place in case we ever decide we want to do something interesting with the bonding information. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 18 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 7 - .../drm/i915/gt/intel_execlists_submission.c | 100 .../drm/i915/gt/intel_execlists_submission.h | 4 - drivers/gpu/drm/i915/gt/selftest_execlists.c | 229 -- 6 files changed, 7 insertions(+), 353 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index e8179918fa306..5f8d0faf783aa 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) } virtual = set->engines->engines[idx]->engine; + if (intel_engine_is_virtual(virtual)) { + drm_dbg(&i915->drm, + "Bonding with virtual engines not allowed\n"); + return -EINVAL; + } + err = check_user_mbz(&ext->flags); if (err) return err; @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) n, ci.engine_class, ci.engine_instance); return -EINVAL; } - - /* -* A non-virtual engine has no siblings to choose between; and -* a submit fence will always be directed to the one engine. -*/ - if (intel_engine_is_virtual(virtual)) { - err = intel_virtual_engine_attach_bond(virtual, - master, - bond); - if (err) - return err; - } } return 0; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d640bba6ad9ab..efb2fa3522a42 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (args->flags & I915_EXEC_FENCE_SUBMIT) err = i915_request_await_execution(eb.request, in_fence, - eb.engine->bond_execute); + NULL); else err = i915_request_await_dma_fence(eb.request, in_fence); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 883bafc449024..68cfe5080325c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -446,13 +446,6 @@ struct intel_engine_cs { */ void(*submit_request)(struct i915_request *rq); - /* -* Called on signaling of a SUBMIT_FENCE, passing along the signaling -* request down to the bonded pairs. -*/ - void(*bond_execute)(struct i915_request *rq, - struct dma_fence *signal); - /* * Call when the priority on a request has changed and it and its * dependencies may need rescheduling. Note the request itself may diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de124870af44d..b6e2b59f133b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -181,18 +181,6 @@ struct virtual_engine { int prio; } nodes[I915_NUM_ENGINES]; - /* -* Keep track of bonded pairs -- restrictions upon on our selection -* of physical engines any particular request may be submitted to. -* If we receive a submit-fence from a master engine, we will only -* use one of sibling_mask physical engines. -*/ - struct ve_bond { - const struct intel_engine_cs *master; -
[PATCH 12/21] drm/i915/gem: Add a separate validate_priority helper
Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 + 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 941fbf78267b4..e5efd22c89ba2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -169,6 +169,28 @@ lookup_user_engine(struct i915_gem_context *ctx, return i915_gem_context_get_engine(ctx, idx); } +static int validate_priority(struct drm_i915_private *i915, +const struct drm_i915_gem_context_param *args) +{ + s64 priority = args->value; + + if (args->size) + return -EINVAL; + + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY)) + return -ENODEV; + + if (priority > I915_CONTEXT_MAX_USER_PRIORITY || + priority < I915_CONTEXT_MIN_USER_PRIORITY) + return -EINVAL; + + if (priority > I915_CONTEXT_DEFAULT_PRIORITY && + !capable(CAP_SYS_NICE)) + return -EPERM; + + return 0; +} + static struct i915_address_space * context_get_vm_rcu(struct i915_gem_context *ctx) { @@ -1744,23 +1766,13 @@ static void __apply_priority(struct intel_context *ce, void *arg) static int set_priority(struct i915_gem_context *ctx, const struct drm_i915_gem_context_param *args) { - s64 priority = args->value; - - if (args->size) - return -EINVAL; - - if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY)) - return -ENODEV; - - if (priority > I915_CONTEXT_MAX_USER_PRIORITY || - priority < I915_CONTEXT_MIN_USER_PRIORITY) - return -EINVAL; + int err; - if (priority > I915_CONTEXT_DEFAULT_PRIORITY && - !capable(CAP_SYS_NICE)) - return -EPERM; + err = validate_priority(ctx->i915, args); + if (err) + return err; - ctx->sched.priority = priority; + ctx->sched.priority = args->value; context_apply_all(ctx, __apply_priority, ctx); return 0; -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 10/21] drm/i915/request: Remove the hook from await_execution
This was only ever used for bonded virtual engine execution. Since that's no longer allowed, this is dead code. Signed-off-by: Jason Ekstrand --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 3 +- drivers/gpu/drm/i915/i915_request.c | 42 --- drivers/gpu/drm/i915/i915_request.h | 4 +- 3 files changed, 9 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index efb2fa3522a42..7024adcd5cf15 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -3473,8 +3473,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (in_fence) { if (args->flags & I915_EXEC_FENCE_SUBMIT) err = i915_request_await_execution(eb.request, - in_fence, - NULL); + in_fence); else err = i915_request_await_dma_fence(eb.request, in_fence); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index bec9c3652188b..7e00218b8c105 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -49,7 +49,6 @@ struct execute_cb { struct irq_work work; struct i915_sw_fence *fence; - void (*hook)(struct i915_request *rq, struct dma_fence *signal); struct i915_request *signal; }; @@ -180,17 +179,6 @@ static void irq_execute_cb(struct irq_work *wrk) kmem_cache_free(global.slab_execute_cbs, cb); } -static void irq_execute_cb_hook(struct irq_work *wrk) -{ - struct execute_cb *cb = container_of(wrk, typeof(*cb), work); - - cb->hook(container_of(cb->fence, struct i915_request, submit), -&cb->signal->fence); - i915_request_put(cb->signal); - - irq_execute_cb(wrk); -} - static __always_inline void __notify_execute_cb(struct i915_request *rq, bool (*fn)(struct irq_work *wrk)) { @@ -517,17 +505,12 @@ static bool __request_in_flight(const struct i915_request *signal) static int __await_execution(struct i915_request *rq, struct i915_request *signal, - void (*hook)(struct i915_request *rq, - struct dma_fence *signal), gfp_t gfp) { struct execute_cb *cb; - if (i915_request_is_active(signal)) { - if (hook) - hook(rq, &signal->fence); + if (i915_request_is_active(signal)) return 0; - } cb = kmem_cache_alloc(global.slab_execute_cbs, gfp); if (!cb) @@ -537,12 +520,6 @@ __await_execution(struct i915_request *rq, i915_sw_fence_await(cb->fence); init_irq_work(&cb->work, irq_execute_cb); - if (hook) { - cb->hook = hook; - cb->signal = i915_request_get(signal); - cb->work.func = irq_execute_cb_hook; - } - /* * Register the callback first, then see if the signaler is already * active. This ensures that if we race with the @@ -1253,7 +1230,7 @@ emit_semaphore_wait(struct i915_request *to, goto await_fence; /* Only submit our spinner after the signaler is running! */ - if (__await_execution(to, from, NULL, gfp)) + if (__await_execution(to, from, gfp)) goto await_fence; if (__emit_semaphore_wait(to, from, from->fence.seqno)) @@ -1284,16 +1261,14 @@ static int intel_timeline_sync_set_start(struct intel_timeline *tl, static int __i915_request_await_execution(struct i915_request *to, - struct i915_request *from, - void (*hook)(struct i915_request *rq, - struct dma_fence *signal)) + struct i915_request *from) { int err; GEM_BUG_ON(intel_context_is_barrier(from->context)); /* Submit both requests at the same time */ - err = __await_execution(to, from, hook, I915_FENCE_GFP); + err = __await_execution(to, from, I915_FENCE_GFP); if (err) return err; @@ -1406,9 +1381,7 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) int i915_request_await_execution(struct i915_request *rq, -struct dma_fence *fence, -void (*hook)(struct i915_request *rq, - struct dma_fence *signal)) +struct dma_fence *fence) { struct dma_fence **child = &fence; unsigned int nc
[PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines
There's no sense in allowing userspace to create more engines than it can possibly access via execbuf. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 5f8d0faf783aa..ecb3bf5369857 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx, return -EINVAL; } - /* -* Note that I915_EXEC_RING_MASK limits execbuf to only using the -* first 64 engines defined here. -*/ num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines); + if (num_engines > I915_EXEC_RING_MASK + 1) + return -EINVAL; + set.engines = alloc_engines(num_engines); if (!set.engines) return -ENOMEM; -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 11/21] drm/i915: Stop manually RCU banging in reset_stats_ioctl
As far as I can tell, the only real reason for this is to avoid taking a reference to the i915_gem_context. The cost of those two atomics probably pales in comparison to the cost of the ioctl itself so we're really not buying ourselves anything here. We're about to make context lookup a tiny bit more complicated, so let's get rid of the one hand- rolled case. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 - drivers/gpu/drm/i915/i915_drv.h | 8 +--- 2 files changed, 5 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index ecb3bf5369857..941fbf78267b4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, struct drm_i915_private *i915 = to_i915(dev); struct drm_i915_reset_stats *args = data; struct i915_gem_context *ctx; - int ret; if (args->flags || args->pad) return -EINVAL; - ret = -ENOENT; - rcu_read_lock(); - ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id); + ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id); if (!ctx) - goto out; + return -ENOENT; /* * We opt for unserialised reads here. This may result in tearing @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, args->batch_active = atomic_read(&ctx->guilty_count); args->batch_pending = atomic_read(&ctx->active_count); - ret = 0; -out: - rcu_read_unlock(); - return ret; + i915_gem_context_put(ctx); + return 0; } /* GEM context-engines iterator: for_each_gem_engine() */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0b44333eb7033..8571c5c1509a7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1840,19 +1840,13 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags); -static inline struct i915_gem_context * -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id) -{ - return xa_load(&file_priv->context_xa, id); -} - static inline struct i915_gem_context * i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id) { struct i915_gem_context *ctx; rcu_read_lock(); - ctx = __i915_gem_context_lookup_rcu(file_priv, id); + ctx = xa_load(&file_priv->context_xa, id); if (ctx && !kref_get_unless_zero(&ctx->ref)) ctx = NULL; rcu_read_unlock(); -- 2.31.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel