[PATCH -next] video: fbdev: intelfb: Remove set but not used variable 'val'
Fixes gcc '-Wunused-but-set-variable' warning: drivers/video/fbdev/intelfb/intelfb_i2c.c: In function 'intelfb_gpio_setscl': drivers/video/fbdev/intelfb/intelfb_i2c.c:58:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] drivers/video/fbdev/intelfb/intelfb_i2c.c: In function 'intelfb_gpio_setsda': drivers/video/fbdev/intelfb/intelfb_i2c.c:69:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] It never used since introduction. Signed-off-by: Baokun Li --- drivers/video/fbdev/intelfb/intelfb_i2c.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/video/fbdev/intelfb/intelfb_i2c.c b/drivers/video/fbdev/intelfb/intelfb_i2c.c index 3300bd31d9d7..4df2f1f8a18e 100644 --- a/drivers/video/fbdev/intelfb/intelfb_i2c.c +++ b/drivers/video/fbdev/intelfb/intelfb_i2c.c @@ -55,22 +55,20 @@ static void intelfb_gpio_setscl(void *data, int state) { struct intelfb_i2c_chan *chan = data; struct intelfb_info *dinfo = chan->dinfo; - u32 val; OUTREG(chan->reg, (state ? SCL_VAL_OUT : 0) | SCL_DIR | SCL_DIR_MASK | SCL_VAL_MASK); - val = INREG(chan->reg); + INREG(chan->reg); } static void intelfb_gpio_setsda(void *data, int state) { struct intelfb_i2c_chan *chan = data; struct intelfb_info *dinfo = chan->dinfo; - u32 val; OUTREG(chan->reg, (state ? SDA_VAL_OUT : 0) | SDA_DIR | SDA_DIR_MASK | SDA_VAL_MASK); - val = INREG(chan->reg); + INREG(chan->reg); } static int intelfb_gpio_getscl(void *data) -- 2.25.4
Re: [Intel-gfx] [PATCH 15/18] drm/i915/guc: Ensure H2G buffer updates visible before tail update
On 28.05.2021 03:13, John Harrison wrote: > On 5/26/2021 10:58, Matthew Brost wrote: >> On Wed, May 26, 2021 at 02:36:18PM +0200, Michal Wajdeczko wrote: >>> On 26.05.2021 08:42, Matthew Brost wrote: Ensure H2G buffer updates are visible before descriptor tail updates by inserting a barrier between the H2G buffer update and the tail. The barrier is simple wmb() for SMEM and is register write for LMEM. This is needed if more than 1 H2G can be inflight at once. Signed-off-by: Matthew Brost Cc: Michal Wajdeczko --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index fb875d257536..42063e1c355d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -328,6 +328,18 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct) return ++ct->requests.last_fence; } +static void write_barrier(struct intel_guc_ct *ct) { + struct intel_guc *guc = ct_to_guc(ct); + struct intel_gt *gt = guc_to_gt(guc); + + if (i915_gem_object_is_lmem(guc->ct.vma->obj)) { + GEM_BUG_ON(guc->send_regs.fw_domains); + intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0); >>> hmm, as this is one of the GuC scratch registers used for H2G MMIO >>> communication, writing 0 there might be interpreted by the GuC as new >>> request with action=0 and might results in extra processing/logging on >>> GuC side, and, since from here we don't protect access to this register >>> by send_mutex, we can corrupt other MMIO message being prepared from >>> different thread, ... can't we use other register ? >>> >> Hmm, this code has been internal for a long time and we haven't seen an >> issues. MMIOs are always attempted to be processed each interrupt and >> then CTBs are processed next. A value a 0 in scratch0 results in no MMIOs >> being processed as a value of 0 is a reserved action which translates to >> a NOP. >> >> Also in the current i915 once CTBs are enabled MMIOs are never used. >> That being said, I think once we transition to the new interface + >> enable suspend on a VF MMIOs might be used. >> >> With that I purpose that we merge this as is with a comment saying if we >> ever mix CTBs and MMIOs we need to find another MMIO register. I don't >> changing this now is worth delaying upstreaming this and also any change >> we make now will make us lose confidence in code that has been >> thoroughly tested. >> >> Matt > This was discussed in chat while inspecting the GuC firmware code. > Writing zero to the scratch does indeed not trigger any extra processing > of spurious MMIO H2Gs. The register is indeed always checked when the > host triggers a CTB H2G, but zero counts as invalid and thus will be > skipped. > > So with a comment about not mixing CTB and MMIOs, I think we are good > for now. It seems unlikely that MMIOs & CTB would be mixed. MMIOs are > only used for initialisation operations and should not be necessary once > the CTBs are up and running. If mixing does occur in the future, it > sounds like something that should be addressed at the GuC architecture > level! well, unlikely is not the same as not possible... especially that on MMIO path we are protecting access to this register, so maybe, to try capture any unexpected scenarios, we should at least add something like: GEM_WARN_ON(mutex_is_locked(&guc->send_mutex)) and since you already check for send_regs.fw_domains actual register offset should be taken from send_regs.base ? alternatively, since I doubt that we have to use this specific send register, we could define i915 level function for the purpose of triggering write barrier (or maybe we already have one?) that is using register that is not conflicting with guc MMIO communication... note, in case you can't find any other safe register to write, maybe better option to use is SOFT_SCRATCH(0xc180) that is still available on Gen11, but it is not used by GuC any more for MMIO communication, and on Gen9 we don't have lmem so no conflict at all, which we can check with: GEM_BUG_ON(send_regs.base == SOFT_SCRATCH) and then we should be safe for sure, not just "unlikely" Michal > > With the comment added: > Reviewed-by: John Harrison > > >> + } else { + wmb(); + } +} + /** * DOC: CTB Host to GuC request * @@ -411,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct, } GEM_BUG_ON(tail > size); + /* + * make sure H2G buffer update and LRC tail update (if this triggering a + * submission) are visible before updating the descriptor tail + */ + write_barrier(ct); + /* now update de
Re: [PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace
Hi Alyssa, Will this be enough to implement GL_TIMESTAMP and GL_TIME_ELAPSED queries? Guess the DDK implements these as WRITE_VALUE jobs, and there's also a soft job BASE_JD_REQ_SOFT_DUMP_CPU_GPU_TIME that I guess is used for glGet*(GL_TIMESTAMP). Other DRM drivers use an ioctl for that instead. Regards, Tomeu On 5/27/21 10:38 PM, alyssa.rosenzw...@collabora.com wrote: From: Alyssa Rosenzweig Mali has hardware cycle counters (and GPU timestamps) available for profiling. These are exposed in various ways: - Kernel: As CYCLE_COUNT and TIMESTAMP registers - Job chain: As WRITE_VALUE descriptors - Shader (Midgard): As LD_SPECIAL selectors - Shader (Bifrost): As the LD_GCLK.u64 instruction These form building blocks for profiling features, for example the ARB_shader_clock extension which accesses the counters from an application's shader. The counters consume power, so it is recommended to disable the counters when not in use. To do so, we follow the strategy from mali_kbase: add a counter requirement to the job, start the counters only when required, and stop them as quickly as possible. The new UABI will be used in Mesa. An implementation of ARB_shader_clock using this UABI is available as a pending upstream merge request [1]. The implementation passes the relevant piglit test, validating both the kernel and mesa. The main outstanding questing is the proper name. Performance monitoring ("PERMON") is the name used by kbase, but it's jargon-y and risks confusion with performance counters, an orthogonal mechanism. Cycle count is more descriptive and matches the actual hardware name, but obscures that the same mechanism is required for GPU timestamps. This bit of bikeshedding aside, I'm pleased with the patches. [1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/11051 Alyssa Rosenzweig (4): drm/panfrost: Add cycle counter job requirement drm/panfrost: Add CYCLE_COUNT_START/STOP commands drm/panfrost: Add permon acquire/release helpers drm/panfrost: Handle PANFROST_JD_REQ_PERMON drivers/gpu/drm/panfrost/panfrost_device.h | 3 +++ drivers/gpu/drm/panfrost/panfrost_drv.c| 10 +++--- drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 drivers/gpu/drm/panfrost/panfrost_gpu.h| 3 +++ drivers/gpu/drm/panfrost/panfrost_job.c| 6 ++ drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ include/uapi/drm/panfrost_drm.h| 3 ++- 7 files changed, 43 insertions(+), 4 deletions(-)
Re: [PATCH 0/4] Fix the i2c/clk bug of stm32 mcu platform
Hi Patrice, Alain, Could you help to take a look at this patchset, thanks. This series is the rebase to the newest kernel commit: 88b06399c9c766c283e070b022b5ceafa4f63f19 according to the request from: https://lore.kernel.org/lkml/ff2bc09d-1a17-50d4-d3ee-16fd3a86d...@foss.st.com/ The clk bug affects the kernel bootup on stm32f469-disco board in case display config(CONFIG_DRM_STM, CONFIG_DRM_STM_DSI, DRM_PANEL_ORISETECH_OTM8009A) enabled. If you want to test clk patch on stm32f429-disco board, the panel-ilitek-ili9341.c can be used for that purpose (CONFIG_DRM_STM, DRM_PANEL_ILITEK_ILI9341) i2c driver patch intent to fix the touch panel driver get data through i2c bus timeout issue. Best regards. Dillon On Fri, May 14, 2021 at 7:02 PM wrote: > > From: Dillon Min > > This seriese fix three i2c/clk bug for stm32 f4/f7 > - kernel runing in sdram, i2c driver get data timeout > - ltdc clk turn off after kernel console active > - kernel hang in set ltdc clock rate > > clk bug found on stm32f429/f469-disco board > > Hi Patrice: > below is the guide to verify the patch: > > setup test env with following files(link at below 'files link'): > [1] u-boot-dtb.bin > [2] rootfs zip file (used in kernel initramfs) > [3] u-boot's mkimage to create itb file > [4] kernel config file > [5] my itb with-or-without i2c patch > > This patch based on kernel commit: > 88b06399c9c766c283e070b022b5ceafa4f63f19 > > Note: > panel-ilitek-ili9341.c is the driver which was submitted last year, but not > get accepted. it's used to setup touch screen calibration, then test i2c. > > create itb file(please correct path of 'data'): > ./mkimage -f stm32.its stm32.itb > > HW setup: > console: >PA9, PA10 >usart0 >serial@40011000 >115200 8n1 > > -- flash u-boot.bin to stm32f429-disco on PC > $ sudo openocd -f board/stm32f429discovery.cfg -c \ > '{PATH-TO-YOUR-UBOOT}/u-boot-dtb.bin 0x0800 exit reset' > > -- setup kernel load bootargs at u-boot > U-Boot > setenv bootargs 'console=tty0 console=ttySTM0,115200 > root=/dev/ram rdinit=/linuxrc loglevel=8 fbcon=rotate:2' > U-Boot > loady;bootm > (download stm32.dtb or your kernel with itb format, or download zImage, dtb) > > -- setup ts_calibrate running env on stm32f429-disco > / # export TSLIB_CONFFILE=/etc/ts.conf > / # export TSLIB_TSDEVICE=/dev/input/event0 > / # export TSLIB_CONSOLEDEVICE=none > / # export TSLIB_FBDEVICE=/dev/fb0 > > -- clear screen > / # ./fb > > -- run ts_calibrate > / # ts_calibrate > (you can calibrate touchscreen now, and get below errors) > > [ 113.942087] stmpe-i2c0-0041: failed to read regs 0x52: -110 > [ 114.063598] stmpe-i2c 0-0041: failed to read reg 0x4b: -16 > [ 114.185629] stmpe-i2c 0-0041: failed to read reg 0x40: -16 > [ 114.307257] stmpe-i2c 0-0041: failed to write reg 0xb: -16 > > ... > with i2c patch applied, you will find below logs: > > RAW-> 3164 908 183 118.110884 > TS_READ_RAW> x = 3164, y =908, pressure = 183 > RAW-> 3166 922 126 118.138946 > TS_READ_RAW> x = 3166, y = 922, pressure = 126 > > > files link: > https://drive.google.com/drive/folders/1qNbjChcB6UGtKzne2F5x9_WG_sZFyo3o?usp=sharing > > > > > Dillon Min (4): > drm/panel: Add ilitek ili9341 panel driver > i2c: stm32f4: Fix stmpe811 get xyz data timeout issue > clk: stm32: Fix stm32f429's ltdc driver hang in set clock rate > clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after > kernel startup > > drivers/clk/clk-stm32f4.c| 10 +- > drivers/gpu/drm/panel/Kconfig| 12 + > drivers/gpu/drm/panel/Makefile |1 + > drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 1285 > ++ > drivers/i2c/busses/i2c-stm32f4.c | 12 +- > 5 files changed, 1310 insertions(+), 10 deletions(-) > create mode 100755 drivers/gpu/drm/panel/panel-ilitek-ili9341.c > > -- > 2.7.4 >
Re: [PATCH] drm: Fix for GEM buffers with write-combine memory
On 28/05/2021 02:03, Paul Cercueil wrote: The previous commit wrongly assumed that dma_mmap_wc() could be replaced by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup, but did not work everywhere. Use dma_mmap_wc() when the buffer has the write-combine cache attribute, and dma_mmap_pages() when it has the non-coherent cache attribute. Signed-off-by: Paul Cercueil Reported-by: Tomi Valkeinen Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent memory") --- drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index 235c7a63da2b..4c3772651954 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) cma_obj = to_drm_gem_cma_obj(obj); - vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); - if (!cma_obj->map_noncoherent) - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + if (cma_obj->map_noncoherent) { + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); + + ret = dma_mmap_pages(cma_obj->base.dev->dev, +vma, vma->vm_end - vma->vm_start, +virt_to_page(cma_obj->vaddr)); + } else { + ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr, + cma_obj->paddr, vma->vm_end - vma->vm_start); - ret = dma_mmap_pages(cma_obj->base.dev->dev, -vma, vma->vm_end - vma->vm_start, -virt_to_page(cma_obj->vaddr)); + } if (ret) drm_gem_vm_close(vma); Reviewed-by: Tomi Valkeinen and Tested-by: Tomi Valkeinen Thanks! Btw, the kernel-doc for drm_gem_cma_create doesn't quite match, as it says wc is always used. Tomi
[PATCH 1/1] drm/i915/selftests: Fix error return code in live_parallel_switch()
The error code returned from intel_context_create() should be propagated instead of 0, as done elsewhere in this function. Fixes: 50d16d44cce4 ("drm/i915/selftests: Exercise context switching in parallel") Reported-by: Hulk Robot Signed-off-by: Zhen Lei --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 5fef592390cb..7db9e31da385 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -338,8 +338,10 @@ static int live_parallel_switch(void *arg) continue; ce = intel_context_create(data[m].ce[0]->engine); - if (IS_ERR(ce)) + if (IS_ERR(ce)) { + err = PTR_ERR(ce); goto out; + } err = intel_context_pin(ce); if (err) { -- 2.25.1
Re: [PATCH v9 07/10] mm: Device exclusive memory access
On Thursday, 27 May 2021 11:04:57 PM AEST Peter Xu wrote: > On Thu, May 27, 2021 at 01:35:39PM +1000, Alistair Popple wrote: > > > > + * > > > > + * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device > > > > will > > > > no + * longer have exclusive access to the page. May ignore the > > > > invalidation that's + * part of make_device_exclusive_range() if the > > > > owner field > > > > + * matches the value passed to make_device_exclusive_range(). > > > > > > Perhaps s/matches/does not match/? > > > > No, "matches" is correct. The MMU_NOTIFY_EXCLUSIVE notifier is to notify a > > listener that a range is being invalidated for the purpose of making the > > range available for some device to have exclusive access to. Which does > > also mean a device getting the notification no longer has exclusive > > access if it already did. > > > > A unique type is needed because when creating the range a driver needs to > > form a mmu critical section (with mmu_interval_read_begin()/ > > mmu_interval_read_end()) to ensure the entry remains valid long enough to > > program the device pte and hasn't been invalidated. > > > > However without a way of filtering any invalidations will result in a > > retry, but make_device_exclusive_range() needs to do an invalidation > > during installation of the entry. To avoid this causing infinite retries > > the driver ignores specific invalidation events that it knows don't > > apply, ie. the invalidations that are a result of that driver asking for > > device exclusive entries. > > OK I think I get it now.. so the driver checks both EXCLUSIVE and owner, if > all match it skips the notify, otherwise it's treated like all the rest. > Thanks. > > However then it's still confusing (as I raised it too in previous comment) > that we use CLEAR when re-installing the valid pte. It's merely against > what CLEAR means. Oh, thanks. I understand where you are coming from now - the pte is already invalid so ordinarily wouldn't need clearing. > How about sending EXCLUSIVE for both mark/restore? Just that when restore > we notify with owner==NULL telling that no one is owning it anymore so > driver needs to drop the ownership. I assume your driver patch does not > need change too. Would that be much cleaner than CLEAR? I bet it also > makes commenting the new notify easier. > > What do you think? That seems like a good and avoids adding another type. And as you say they driver patch shouldn't need changing either (will need to confirm though). > [...] > > > > > + vma->vm_mm, address, > > > > min(vma->vm_end, > > > > + address + page_size(page)), > > > > args->owner); + mmu_notifier_invalidate_range_start(&range); > > > > + > > > > + while (page_vma_mapped_walk(&pvmw)) { > > > > + /* Unexpected PMD-mapped THP? */ > > > > + VM_BUG_ON_PAGE(!pvmw.pte, page); > > > > + > > > > + if (!pte_present(*pvmw.pte)) { > > > > + ret = false; > > > > + page_vma_mapped_walk_done(&pvmw); > > > > + break; > > > > + } > > > > + > > > > + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); > > > > > > I see that all pages passed in should be done after FOLL_SPLIT_PMD, so > > > is > > > this needed? Or say, should subpage==page always be true? > > > > Not always, in the case of a thp there are small ptes which will get > > device > > exclusive entries. > > FOLL_SPLIT_PMD will first split the huge thp into smaller pages, then do > follow_page_pte() on them (in follow_pmd_mask): > > if (flags & FOLL_SPLIT_PMD) { > int ret; > page = pmd_page(*pmd); > if (is_huge_zero_page(page)) { > spin_unlock(ptl); > ret = 0; > split_huge_pmd(vma, pmd, address); > if (pmd_trans_unstable(pmd)) > ret = -EBUSY; > } else { > spin_unlock(ptl); > split_huge_pmd(vma, pmd, address); > ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; > } > > return ret ? ERR_PTR(ret) : > follow_page_pte(vma, address, pmd, flags, > &ctx->pgmap); } > > So I thought all pages are small pages? The page will remain as a transparent huge page though (at least as I understand things). FOLL_SPLIT_PMD turns it into a pte mapped thp by splitting the pmd and creating pte's mapping the subpages but doesn't split the page itself. For comparison FOLL_SPLIT (which has been removed in v5.13 due to lack of use) is what would be used to split the page in the above GUP code by calling split_huge_page() rather than split_huge_pmd(). This was done to avoid adding code for handling device exclusive entries at the pmd level as
Re: [Intel-gfx] [PATCH 15/18] drm/i915/guc: Ensure H2G buffer updates visible before tail update
On 5/26/2021 10:58, Matthew Brost wrote: On Wed, May 26, 2021 at 02:36:18PM +0200, Michal Wajdeczko wrote: On 26.05.2021 08:42, Matthew Brost wrote: Ensure H2G buffer updates are visible before descriptor tail updates by inserting a barrier between the H2G buffer update and the tail. The barrier is simple wmb() for SMEM and is register write for LMEM. This is needed if more than 1 H2G can be inflight at once. Signed-off-by: Matthew Brost Cc: Michal Wajdeczko --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index fb875d257536..42063e1c355d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -328,6 +328,18 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct) return ++ct->requests.last_fence; } +static void write_barrier(struct intel_guc_ct *ct) { + struct intel_guc *guc = ct_to_guc(ct); + struct intel_gt *gt = guc_to_gt(guc); + + if (i915_gem_object_is_lmem(guc->ct.vma->obj)) { + GEM_BUG_ON(guc->send_regs.fw_domains); + intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0); hmm, as this is one of the GuC scratch registers used for H2G MMIO communication, writing 0 there might be interpreted by the GuC as new request with action=0 and might results in extra processing/logging on GuC side, and, since from here we don't protect access to this register by send_mutex, we can corrupt other MMIO message being prepared from different thread, ... can't we use other register ? Hmm, this code has been internal for a long time and we haven't seen an issues. MMIOs are always attempted to be processed each interrupt and then CTBs are processed next. A value a 0 in scratch0 results in no MMIOs being processed as a value of 0 is a reserved action which translates to a NOP. Also in the current i915 once CTBs are enabled MMIOs are never used. That being said, I think once we transition to the new interface + enable suspend on a VF MMIOs might be used. With that I purpose that we merge this as is with a comment saying if we ever mix CTBs and MMIOs we need to find another MMIO register. I don't changing this now is worth delaying upstreaming this and also any change we make now will make us lose confidence in code that has been thoroughly tested. Matt This was discussed in chat while inspecting the GuC firmware code. Writing zero to the scratch does indeed not trigger any extra processing of spurious MMIO H2Gs. The register is indeed always checked when the host triggers a CTB H2G, but zero counts as invalid and thus will be skipped. So with a comment about not mixing CTB and MMIOs, I think we are good for now. It seems unlikely that MMIOs & CTB would be mixed. MMIOs are only used for initialisation operations and should not be necessary once the CTBs are up and running. If mixing does occur in the future, it sounds like something that should be addressed at the GuC architecture level! With the comment added: Reviewed-by: John Harrison + } else { + wmb(); + } +} + /** * DOC: CTB Host to GuC request * @@ -411,6 +423,12 @@ static int ct_write(struct intel_guc_ct *ct, } GEM_BUG_ON(tail > size); + /* +* make sure H2G buffer update and LRC tail update (if this triggering a +* submission) are visible before updating the descriptor tail +*/ + write_barrier(ct); + /* now update desc tail (back in bytes) */ desc->tail = tail * 4; return 0; ___ Intel-gfx mailing list intel-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[PATCH v4 1/1] drm/i915/dg1: Add HWMON power sensor support
As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. The following standard HWMON power sensors are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input - power1_cap - power1_max Some non-standard HWMON power information is also provided, such as enable bits and intervals. Signed-off-by: Dale B Stimson --- .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++ drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.c | 6 + drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_hwmon.c | 757 ++ drivers/gpu/drm/i915/i915_hwmon.h | 42 + drivers/gpu/drm/i915/i915_reg.h | 52 ++ 8 files changed, 978 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon new file mode 100644 index 0..2ee7c413ca190 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon @@ -0,0 +1,116 @@ +What: /sys/devices/.../hwmon/hwmon/energy1_input +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RO. Energy input of device in microjoules. + + The returned textual representation is an unsigned integer + number that can be stored in 64-bits. Warning: The hardware + register is 32-bits wide and can overflow by wrapping around. + A single wrap-around between calls to read this value can + be detected and will be accounted for in the returned value. + At a power consumption of 1 watt, the 32-bit hardware register + would wrap-around approximately every 3 days. + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max_enable +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit is enabled - true or false. + +The power controller will throttle the operating frequency +if the power averaged over a window (typically seconds) +exceeds this limit. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit in milliwatts + +The power controller will throttle the operating frequency +if the power averaged over a window (typically seconds) +exceeds this limit. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max_interval +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit interval in milliseconds over +which sustained power is averaged. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_cap_enable +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: + RW. Power burst limit is enabled - true or false + +See power1_cap_enable power1_cap + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_cap +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: + RW. Power burst limit in milliwatts. + +See power1_cap_enable power1_cap + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power_default_limit +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RO. Default power limit. + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power_min_limit +Date: June 2021 +KernelVersion: 5.14 +Con
[PATCH v4 0/1] drm/i915/dg1: Add HWMON power sensor support
drm/i915/dg1: Add HWMON power support As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. The following standard HWMON entries are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input - power1_cap - power1_max Some non-standard HWMON power information is also provided, such as enable bits and intervals. - v4 Commit mesage minor rewording v4 Move call to i915_hwmon_register() to a more appropriate location, so that it is done after intel_gt_driver_register(). The call to i915_perf_unregister() is moved correspondingly. v4 The proper register to read energy status is PCU_PACKAGE_ENERGY_STATUS. v4 Attribute power1_max_enable is read-only. v3 Added documentation of these hwmon attributes in file Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon v3 Commit mesage minor rewording v3 Function name changes: i915_hwmon_init() -> i915_hwmon_register() i915_hwmon_fini() -> i915_hwmon_unregister() v3 i915_hwmon_register and i915_hwmon_unregister now take arg i915. v3 i915_hwmon_register() now returns void instead of int. v3 Macro FIELD_SHIFT() added to compute shift value from constant field mask. v3 Certain functions now longer require "inline" due to addition of new parameter field_shift, allowing access to constant expressions for the field mask at each call site. These functions now do field access via shift and masking and no longer use le32*() functions (as le32*() required a local constant expression for the mask). _field_read_and_scale() _field_read64_and_scale() _field_scale_and_write() v3 Some comments were modified. v3 Now using sysfs_emit() instead of scnprintf(). V2 Rename local function parameter field_mask to field_msk in order to avoid shadowing the name of function field_mask() from include/linux/bitfield.h. V2 Change a comment introduction from "/**" to "/*", as it is not intended to match a pattern that triggers documentation. Reported-by: kernel test robot V2 Slight movement of calls: - i915_hwmon_init slightly later, after call to i915_setup_sysfs() - i915_hwmon_fini slightly earlier, before i915_teardown_sysfs() V2 Fixed some strong typing issues with le32 functions. Detected by sparse in a run by kernel test robot: Reported-by: kernel test robot Dale B Stimson (1): drm/i915/dg1: Add HWMON power sensor support .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++ drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.c | 6 + drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_hwmon.c | 757 ++ drivers/gpu/drm/i915/i915_hwmon.h | 42 + drivers/gpu/drm/i915/i915_reg.h | 52 ++ 8 files changed, 978 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h Range-diff against v3: 1: ed34d683a0ef1 ! 1: bc8bd78b2c006 drm/i915/dg1: Add HWMON power support @@ Metadata Author: Dale B Stimson ## Commit message ## -drm/i915/dg1: Add HWMON power support +drm/i915/dg1: Add HWMON power sensor support As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. -The following standard HWMON entries are currently supported +The following standard HWMON power sensors are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input @@ drivers/gpu/drm/i915/i915_drv.c #include "i915_irq.h" #include "i915_memcpy.h" @@ drivers/gpu/drm/i915/i915_drv.c: static void i915_driver_register(struct drm_i915_private *dev_priv) - i915_debugfs_register(dev_priv); - i915_setup_sysfs(dev_priv); + + intel_gt_driver_register(&dev_priv->gt); + i915_hwmon_register(dev_priv); + - /* Depends on sysfs having been initialized */ - i915_perf_register(dev_priv); + intel_display_driver_register(dev_priv); + intel_power_domains_enable(dev_priv); @@ drivers/gpu/drm/i915/i915_drv.c: static void i915_driver_unregister(struct drm_i915_private *dev_priv) + + intel_display_driver_unregister(dev_priv); + ++ i915_hwmon_unregister(dev_priv); ++ intel_gt_driver_unregister(&dev_priv->gt); i915_perf_unregister(dev_priv); -+ -+ i915_hwmon_unregister(dev_priv); + i915_pmu_unregister(dev_priv); @@ drivers/gpu/drm/i915/i915_hwmon.c (new) + + with_intel_runtime_pm(unc
Re: [PATCH 11/11] drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default
On Fri, May 21, 2021 at 11:10 AM Daniel Vetter wrote: > Goes through all the drivers and deletes the default hook since it's > the default now. > > Signed-off-by: Daniel Vetter > Cc: Joel Stanley > Cc: Andrew Jeffery > Cc: "Noralf Trønnes" > Cc: Linus Walleij > Cc: Emma Anholt > Cc: David Lechner > Cc: Kamlesh Gurudasani > Cc: Oleksandr Andrushchenko > Cc: Daniel Vetter > Cc: Maxime Ripard > Cc: Thomas Zimmermann > Cc: Sam Ravnborg > Cc: Alex Deucher > Cc: Andy Shevchenko > Cc: linux-asp...@lists.ozlabs.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: xen-de...@lists.xenproject.org Acked-by: Linus Walleij Yours, Linus Walleij
Re: [PATCH v8 04/11] dt-bindings: drm/aux-bus: Add an example
On Tue, May 25, 2021 at 2:02 AM Douglas Anderson wrote: > Now that we have an eDP controller that lists aux-bus, we can safely > add an example to the aux-bus bindings. > > NOTE: this example is just a copy of the one in the 'ti-sn65dsi86' > one. It feels useful to have the example in both places simply because > it's important to document the interaction between the two bindings in > both places. > > Signed-off-by: Douglas Anderson Looks good. Reviewed-by: Linus Walleij Yours, Linus Walleij
Re: [PATCH v8 03/11] dt-bindings: drm/bridge: ti-sn65dsi86: Add aux-bus child
On Tue, May 25, 2021 at 2:02 AM Douglas Anderson wrote: > The patch ("dt-bindings: drm: Introduce the DP AUX bus") talks about > how using the DP AUX bus is better than learning how to slice > bread. Let's add it to the ti-sn65dsi86 bindings. > > Signed-off-by: Douglas Anderson (...) > description: See ../../pwm/pwm.yaml for description of the cell formats.> Just use the full path: /schemas/pwm/pwm.yaml > + aux-bus: > +$ref: ../dp-aux-bus.yaml# Use the full path. (Same method as above) This removes the need for ../../... You do it here: >ports: > $ref: /schemas/graph.yaml#/properties/ports Other than that I think it looks all right! Yours, Linus Walleij
Re: [PATCH 2/2] drm/vc4: hdmi: Convert to gpiod
On Mon, May 24, 2021 at 3:19 PM Maxime Ripard wrote: > The new gpiod interface takes care of parsing the GPIO flags and to > return the logical value when accessing an active-low GPIO, so switching > to it simplifies a lot the driver. > > Signed-off-by: Maxime Ripard Thanks for fixing this! Reviewed-by: Linus Walleij Yours, Linus Walleij
[PATCH v4 1/1] drm/i915/dg1: Add HWMON power sensor support
As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. The following standard HWMON power sensors are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input - power1_cap - power1_max Some non-standard HWMON power information is also provided, such as enable bits and intervals. Signed-off-by: Dale B Stimson --- .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++ drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.c | 6 + drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_hwmon.c | 757 ++ drivers/gpu/drm/i915/i915_hwmon.h | 42 + drivers/gpu/drm/i915/i915_reg.h | 52 ++ 8 files changed, 978 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon new file mode 100644 index 0..2ee7c413ca190 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon @@ -0,0 +1,116 @@ +What: /sys/devices/.../hwmon/hwmon/energy1_input +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RO. Energy input of device in microjoules. + + The returned textual representation is an unsigned integer + number that can be stored in 64-bits. Warning: The hardware + register is 32-bits wide and can overflow by wrapping around. + A single wrap-around between calls to read this value can + be detected and will be accounted for in the returned value. + At a power consumption of 1 watt, the 32-bit hardware register + would wrap-around approximately every 3 days. + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max_enable +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit is enabled - true or false. + +The power controller will throttle the operating frequency +if the power averaged over a window (typically seconds) +exceeds this limit. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit in milliwatts + +The power controller will throttle the operating frequency +if the power averaged over a window (typically seconds) +exceeds this limit. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_max_interval +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RW. Sustained power limit interval in milliseconds over +which sustained power is averaged. + +See power1_max_enable power1_max power1_max_interval + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_cap_enable +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: + RW. Power burst limit is enabled - true or false + +See power1_cap_enable power1_cap + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power1_cap +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: + RW. Power burst limit in milliwatts. + +See power1_cap_enable power1_cap + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power_default_limit +Date: June 2021 +KernelVersion: 5.14 +Contact:dri-devel@lists.freedesktop.org +Description: +RO. Default power limit. + + Only supported for particular Intel i915 graphics platforms. + +What: /sys/devices/.../hwmon/hwmon/power_min_limit +Date: June 2021 +KernelVersion: 5.14 +Con
[PATCH v4 0/1] drm/i915/dg1: Add HWMON power sensor support
drm/i915/dg1: Add HWMON power support As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. The following standard HWMON entries are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input - power1_cap - power1_max Some non-standard HWMON power information is also provided, such as enable bits and intervals. - v4 Commit mesage minor rewording v4 Move call to i915_hwmon_register() to a more appropriate location, so that it is done after intel_gt_driver_register(). The call to i915_perf_unregister() is moved correspondingly. v4 The proper register to read energy status is PCU_PACKAGE_ENERGY_STATUS. v4 Attribute power1_max_enable is read-only. v3 Added documentation of these hwmon attributes in file Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon v3 Commit mesage minor rewording v3 Function name changes: i915_hwmon_init() -> i915_hwmon_register() i915_hwmon_fini() -> i915_hwmon_unregister() v3 i915_hwmon_register and i915_hwmon_unregister now take arg i915. v3 i915_hwmon_register() now returns void instead of int. v3 Macro FIELD_SHIFT() added to compute shift value from constant field mask. v3 Certain functions now longer require "inline" due to addition of new parameter field_shift, allowing access to constant expressions for the field mask at each call site. These functions now do field access via shift and masking and no longer use le32*() functions (as le32*() required a local constant expression for the mask). _field_read_and_scale() _field_read64_and_scale() _field_scale_and_write() v3 Some comments were modified. v3 Now using sysfs_emit() instead of scnprintf(). V2 Rename local function parameter field_mask to field_msk in order to avoid shadowing the name of function field_mask() from include/linux/bitfield.h. V2 Change a comment introduction from "/**" to "/*", as it is not intended to match a pattern that triggers documentation. Reported-by: kernel test robot V2 Slight movement of calls: - i915_hwmon_init slightly later, after call to i915_setup_sysfs() - i915_hwmon_fini slightly earlier, before i915_teardown_sysfs() V2 Fixed some strong typing issues with le32 functions. Detected by sparse in a run by kernel test robot: Reported-by: kernel test robot Dale B Stimson (1): drm/i915/dg1: Add HWMON power sensor support .../ABI/testing/sysfs-driver-intel-i915-hwmon | 116 +++ drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.c | 6 + drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_hwmon.c | 757 ++ drivers/gpu/drm/i915/i915_hwmon.h | 42 + drivers/gpu/drm/i915/i915_reg.h | 52 ++ 8 files changed, 978 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon create mode 100644 drivers/gpu/drm/i915/i915_hwmon.c create mode 100644 drivers/gpu/drm/i915/i915_hwmon.h Range-diff against v3: 1: ed34d683a0ef1 ! 1: bc8bd78b2c006 drm/i915/dg1: Add HWMON power support @@ Metadata Author: Dale B Stimson ## Commit message ## -drm/i915/dg1: Add HWMON power support +drm/i915/dg1: Add HWMON power sensor support As part of the System Managemenent Interface (SMI), use the HWMON subsystem to display power utilization. -The following standard HWMON entries are currently supported +The following standard HWMON power sensors are currently supported (and appropriately scaled): /sys/class/drm/card0/device/hwmon/hwmon - energy1_input @@ drivers/gpu/drm/i915/i915_drv.c #include "i915_irq.h" #include "i915_memcpy.h" @@ drivers/gpu/drm/i915/i915_drv.c: static void i915_driver_register(struct drm_i915_private *dev_priv) - i915_debugfs_register(dev_priv); - i915_setup_sysfs(dev_priv); + + intel_gt_driver_register(&dev_priv->gt); + i915_hwmon_register(dev_priv); + - /* Depends on sysfs having been initialized */ - i915_perf_register(dev_priv); + intel_display_driver_register(dev_priv); + intel_power_domains_enable(dev_priv); @@ drivers/gpu/drm/i915/i915_drv.c: static void i915_driver_unregister(struct drm_i915_private *dev_priv) + + intel_display_driver_unregister(dev_priv); + ++ i915_hwmon_unregister(dev_priv); ++ intel_gt_driver_unregister(&dev_priv->gt); i915_perf_unregister(dev_priv); -+ -+ i915_hwmon_unregister(dev_priv); + i915_pmu_unregister(dev_priv); @@ drivers/gpu/drm/i915/i915_hwmon.c (new) + + with_intel_runtime_pm(unc
Re: [RFC PATCH 03/13] drm/msm/disp/dpu1: Add support for DSC
On 21/05/2021 15:49, Vinod Koul wrote: Display Stream Compression (DSC) is one of the hw blocks in dpu, so add support by adding hw blocks for DSC Signed-off-by: Vinod Koul --- drivers/gpu/drm/msm/Makefile | 1 + .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h| 26 +++ drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c| 221 ++ drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.h| 79 +++ drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h | 13 ++ 5 files changed, 340 insertions(+) create mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c create mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.h diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 610d630326bb..fd8fc57f1f58 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -61,6 +61,7 @@ msm-y := \ disp/dpu1/dpu_hw_blk.o \ disp/dpu1/dpu_hw_catalog.o \ disp/dpu1/dpu_hw_ctl.o \ + disp/dpu1/dpu_hw_dsc.o \ disp/dpu1/dpu_hw_interrupts.o \ disp/dpu1/dpu_hw_intf.o \ disp/dpu1/dpu_hw_lm.o \ diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h index 4dfd8a20ad5c..a699633f7013 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h @@ -165,6 +165,7 @@ enum { * @DPU_PINGPONG_TE2Additional tear check block for split pipes * @DPU_PINGPONG_SPLIT PP block supports split fifo * @DPU_PINGPONG_SLAVE PP block is a suitable slave for split fifo + * @DPU_PINGPONG_DSCDisplay stream compression blocks PP block supports DSC compression? Also you don't seem to set it anywhere. Do we have hardware w/o DSC support? * @DPU_PINGPONG_DITHER,Dither blocks * @DPU_PINGPONG_MAX */ @@ -173,10 +174,21 @@ enum { DPU_PINGPONG_TE2, DPU_PINGPONG_SPLIT, DPU_PINGPONG_SLAVE, + DPU_PINGPONG_DSC, DPU_PINGPONG_DITHER, DPU_PINGPONG_MAX }; +/** + * DSC sub-blocks + * @DPU_DSCDSC sub block + * @DPU_DSC_MAX + */ +enum { + DPU_DSC = 0x1, + DPU_DSC_MAX +}; + Unused /** * CTL sub-blocks * @DPU_CTL_SPLIT_DISPLAY CTL supports video mode split display @@ -413,6 +425,7 @@ struct dpu_dspp_sub_blks { struct dpu_pingpong_sub_blks { struct dpu_pp_blk te; struct dpu_pp_blk te2; + struct dpu_pp_blk dsc; struct dpu_pp_blk dither; }; Unused @@ -547,6 +560,16 @@ struct dpu_merge_3d_cfg { const struct dpu_merge_3d_sub_blks *sblk; }; +/** + * struct dpu_dsc_cfg - information of DSC blocks + * @id enum identifying this block + * @base register offset of this block + * @features bit mask identifying sub-blocks/features + */ +struct dpu_dsc_cfg { + DPU_HW_BLK_INFO; +}; + /** * struct dpu_intf_cfg - information of timing engine blocks * @id enum identifying this block @@ -748,6 +771,9 @@ struct dpu_mdss_cfg { u32 merge_3d_count; const struct dpu_merge_3d_cfg *merge_3d; + u32 dsc_count; + struct dpu_dsc_cfg *dsc; + u32 intf_count; const struct dpu_intf_cfg *intf; diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c new file mode 100644 index ..8b8d0553709d --- /dev/null +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c @@ -0,0 +1,221 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2020, Linaro Limited + */ + +#include "dpu_kms.h" +#include "dpu_hw_catalog.h" +#include "dpu_hwio.h" +#include "dpu_hw_mdss.h" +#include "dpu_hw_dsc.h" + +#define DSC_COMMON_MODE0x000 +#define DSC_ENC 0X004 +#define DSC_PICTURE 0x008 +#define DSC_SLICE 0x00C +#define DSC_CHUNK_SIZE 0x010 +#define DSC_DELAY 0x014 +#define DSC_SCALE_INITIAL 0x018 +#define DSC_SCALE_DEC_INTERVAL 0x01C +#define DSC_SCALE_INC_INTERVAL 0x020 +#define DSC_FIRST_LINE_BPG_OFFSET 0x024 +#define DSC_BPG_OFFSET 0x028 +#define DSC_DSC_OFFSET 0x02C +#define DSC_FLATNESS0x030 +#define DSC_RC_MODEL_SIZE 0x034 +#define DSC_RC 0x038 +#define DSC_RC_BUF_THRESH 0x03C +#define DSC_RANGE_MIN_QP0x074 +#define DSC_RANGE_MAX_QP0x0B0 +#define DSC_RANGE_BPG_OFFSET0x0EC + +static void dpu_hw_dsc_disable(struct dpu_hw_dsc *dsc) +{ + struct dpu_hw_blk_reg_map *c = &dsc->hw; + + DPU_REG_WRITE(c, DSC_COMMON_MODE, 0); +} + +static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc, + struct msm_display_dsc_config *dsc, + u32 mode, bool ich_reset_override) +{ +
Re: [RFC PATCH 03/13] drm/msm/dsi: add support for dsc data
On 21/05/2021 15:49, Vinod Koul wrote: DSC needs some configuration from device tree, add support to read and store these params and add DSC structures in msm_drv Signed-off-by: Vinod Koul --- drivers/gpu/drm/msm/dsi/dsi_host.c | 170 + drivers/gpu/drm/msm/msm_drv.h | 32 ++ 2 files changed, 202 insertions(+) [skipped] DRM_DEV_ERROR(dev, "%s: invalid lane configuration %d\n", diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 2668941df529..26661dd43936 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -70,6 +71,16 @@ enum msm_mdp_plane_property { #define MSM_GPU_MAX_RINGS 4 #define MAX_H_TILES_PER_DISPLAY 2 +/** + * enum msm_display_compression_type - compression method used for pixel stream + * @MSM_DISPLAY_COMPRESSION_NONE: Pixel data is not compressed + * @MSM_DISPLAY_COMPRESSION_DSC: DSC compresison is used + */ +enum msm_display_compression_type { + MSM_DISPLAY_COMPRESSION_NONE, + MSM_DISPLAY_COMPRESSION_DSC, +}; + Seems to be unused /** * enum msm_display_caps - features/capabilities supported by displays * @MSM_DISPLAY_CAP_VID_MODE: Video or "active" mode supported -- With best wishes Dmitry
Re: [Freedreno] [RFC PATCH 00/13] drm/msm: Add Display Stream Compression Support
On Wed, May 26, 2021 at 8:00 AM Jeffrey Hugo wrote: > > On Tue, May 25, 2021 at 11:46 PM Vinod Koul wrote: > > > > Hello Jeff, > > > > On 21-05-21, 08:09, Jeffrey Hugo wrote: > > > On Fri, May 21, 2021 at 6:50 AM Vinod Koul wrote: > > > > > > > > Display Stream Compression (DSC) compresses the display stream in host > > > > which > > > > is later decoded by panel. This series enables this for Qualcomm msm > > > > driver. > > > > This was tested on Google Pixel3 phone which use LGE SW43408 panel. > > > > > > > > The changes include adding DT properties for DSC then hardware blocks > > > > support > > > > required in DPU1 driver and support in encoder. We also add support in > > > > DSI > > > > and introduce required topology changes. > > > > > > > > In order for panel to set the DSC parameters we add dsc in drm_panel > > > > and set > > > > it from the msm driver. > > > > > > > > Complete changes which enable this for Pixel3 along with panel driver > > > > (not > > > > part of this series) and DT changes can be found at: > > > > git.linaro.org/people/vinod.koul/kernel.git pixel/dsc_rfc > > > > > > > > Comments welcome! > > > > > > This feels backwards to me. I've only skimmed this series, and the DT > > > changes didn't come through for me, so perhaps I have an incomplete > > > view. > > > > Not sure why, I see it on lore: > > https://lore.kernel.org/dri-devel/20210521124946.3617862-3-vk...@kernel.org/ > > > > > DSC is not MSM specific. There is a standard for it. Yet it looks > > > like everything is implemented in a MSM specific way, and then pushed > > > to the panel. So, every vendor needs to implement their vendor > > > specific way to get the DSC info, and then push it to the panel? > > > Seems wrong, given there is an actual standard for this feature. > > > > I have added slice and bpp info in the DT here under the host and then > > pass the generic struct drm_dsc_config to panel which allows panel to > > write the pps cmd > > > > Nothing above is MSM specific.. It can very well work with non MSM > > controllers too. > > I disagree. > > The DT bindings you defined (thanks for the direct link) are MSM > specific. I'm not talking (yet) about the properties you defined, but > purely from the stand point that you defined the binding within the > scope of the MSM dsi binding. No other vendor can use those bindings. > Of course, if we look at the properties themselves, they are prefixed > with "qcom", which is vendor specific. > > So, purely on the face of it, this is MSM specific. > > Assuming we want a DT solution for DSC, I think it should be something > like Documentation/devicetree/bindings/clock/clock-bindings.txt (the > first example that comes to mind), which is a non-vendor specific > generic set of properties that each vendor/device specific binding can > inherit. Panel has similar things. > > Specific to the properties, I don't much like that you duplicate BPP, > which is already associated with the panel (although perhaps not in > the scope of DT). What if the panel and your DSC bindings disagree? > Also, I guess I need to ask, have you read the DSC spec? Last I > looked, there were something like 3 dozen properties that could be > configured. You have five in your proposed binding. To me, this is > not a generic DSC solution, this is MSM specific (and frankly I don't > think this supports all the configuration the MSM hardware can do, > either). > > I'm surprised Rob Herring didn't have more to say on this. > > > I didn't envision DSC to be a specific thing, most of > > the patches here are hardware enabling ones for DSC bits for MSM > > hardware. > > > > > Additionally, we define panel properties (resolution, BPP, etc) at the > > > panel, and have the display drivers pull it from the panel. However, > > > for DSC, you do the reverse (define it in the display driver, and push > > > it to the panel). If the argument is that DSC properties can be > > > dynamic, well, so can resolution. Every panel for MSM MTPs supports > > > multiple resolutions, yet we define that with the panel in Linux. > > > > I dont have an answer for that right now, to start with yes the > > properties are in host but I am okay to discuss this and put wherever we > > feel is most correct thing. I somehow dont like that we should pull > > from panel DT and program host with that. Here using struct > > drm_dsc_config allows me to configure panel based on resolution passed > > I somewhat agree that pulling from the panel and programing the host > based on that is an odd solution, but we have it currently. Have a > look at Documentation/devicetree/bindings/display/panel in particular > panel-timing. All of that ends up informing the mdss programing > anyways (particularly the dsi and its phy). So my problem is that we > currently have a solution that seems to just need to be extended, and > instead you have proposed a completely different solution which is > arguably contradictory. > > However, I'd l
[PATCH 11/11] drm/ingenic: Attach bridge chain to encoders
Attach a top-level bridge to each encoder, which will be used for negociating the bus format and flags. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 98 ++- 1 file changed, 77 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 01d8490393d1..f0242e917d6e 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -132,6 +133,26 @@ struct ingenic_drm { struct drm_private_obj private_obj; }; +struct ingenic_drm_bridge { + struct drm_encoder encoder; + struct drm_bridge bridge; + struct drm_bridge *next_bridge; + + /* +* FIXME: this should really be in ingenic_drm_private_state, but there +* doesn't seem to be a way to retrieve a pointer to it from within +* ingenic_drm_encoder_atomic_mode_set (no drm_atomic_state +* back-pointers). +*/ + struct drm_bus_cfg bus_cfg; +}; + +static inline struct ingenic_drm_bridge * +to_ingenic_drm_bridge(struct drm_encoder *encoder) +{ + return container_of(encoder, struct ingenic_drm_bridge, encoder); +} + static inline struct ingenic_drm_private_state * to_ingenic_drm_priv_state(struct drm_private_state *state) { @@ -749,11 +770,10 @@ static void ingenic_drm_encoder_atomic_mode_set(struct drm_encoder *encoder, { struct ingenic_drm *priv = drm_device_get_priv(encoder->dev); struct drm_display_mode *mode = &crtc_state->adjusted_mode; - struct drm_connector *conn = conn_state->connector; - struct drm_display_info *info = &conn->display_info; + struct ingenic_drm_bridge *bridge = to_ingenic_drm_bridge(encoder); unsigned int cfg, rgbcfg = 0; - priv->panel_is_sharp = info->bus_flags & DRM_BUS_FLAG_SHARP_SIGNALS; + priv->panel_is_sharp = bridge->bus_cfg.flags & DRM_BUS_FLAG_SHARP_SIGNALS; if (priv->panel_is_sharp) { cfg = JZ_LCD_CFG_MODE_SPECIAL_TFT_1 | JZ_LCD_CFG_REV_POLARITY; @@ -766,19 +786,19 @@ static void ingenic_drm_encoder_atomic_mode_set(struct drm_encoder *encoder, cfg |= JZ_LCD_CFG_HSYNC_ACTIVE_LOW; if (mode->flags & DRM_MODE_FLAG_NVSYNC) cfg |= JZ_LCD_CFG_VSYNC_ACTIVE_LOW; - if (info->bus_flags & DRM_BUS_FLAG_DE_LOW) + if (bridge->bus_cfg.flags & DRM_BUS_FLAG_DE_LOW) cfg |= JZ_LCD_CFG_DE_ACTIVE_LOW; - if (info->bus_flags & DRM_BUS_FLAG_PIXDATA_DRIVE_NEGEDGE) + if (bridge->bus_cfg.flags & DRM_BUS_FLAG_PIXDATA_DRIVE_NEGEDGE) cfg |= JZ_LCD_CFG_PCLK_FALLING_EDGE; if (!priv->panel_is_sharp) { - if (conn->connector_type == DRM_MODE_CONNECTOR_TV) { + if (conn_state->connector->connector_type == DRM_MODE_CONNECTOR_TV) { if (mode->flags & DRM_MODE_FLAG_INTERLACE) cfg |= JZ_LCD_CFG_MODE_TV_OUT_I; else cfg |= JZ_LCD_CFG_MODE_TV_OUT_P; } else { - switch (*info->bus_formats) { + switch (bridge->bus_cfg.format) { case MEDIA_BUS_FMT_RGB565_1X16: cfg |= JZ_LCD_CFG_MODE_GENERIC_16BIT; break; @@ -804,20 +824,31 @@ static void ingenic_drm_encoder_atomic_mode_set(struct drm_encoder *encoder, regmap_write(priv->map, JZ_REG_LCD_RGBC, rgbcfg); } -static int ingenic_drm_encoder_atomic_check(struct drm_encoder *encoder, - struct drm_crtc_state *crtc_state, - struct drm_connector_state *conn_state) +static int ingenic_drm_bridge_attach(struct drm_bridge *bridge, +enum drm_bridge_attach_flags flags) +{ + struct drm_encoder *encoder = bridge->encoder; + struct ingenic_drm_bridge *ingenic_bridge = to_ingenic_drm_bridge(encoder); + + return drm_bridge_attach(encoder, ingenic_bridge->next_bridge, +&ingenic_bridge->bridge, flags); +} + +static int ingenic_drm_bridge_atomic_check(struct drm_bridge *bridge, + struct drm_bridge_state *bridge_state, + struct drm_crtc_state *crtc_state, + struct drm_connector_state *conn_state) { - struct drm_display_info *info = &conn_state->connector->display_info; struct drm_display_mode *mode = &crtc_state->adjusted_mode; + struct drm_encoder *encoder = bridge->encoder; + struct ingenic_drm_bridge *ingenic_bridge = to_ingenic_drm_bridge(encoder); - if (info->num_bus_formats != 1) -
[PATCH 10/11] drm/ingenic: Add doublescan feature
A lot of devices with an Ingenic SoC have a weird LCD panel attached, where the pixels are not square. For instance, the AUO A030JTN01 and Innolux EJ030NA panels have a resolution of 320x480 with a 4:3 aspect ratio. All userspace applications are built with the assumption that the pixels are square. To be able to support these devices without too much effort, add a doublescan feature, which allows the f0 and f1 planes to be used with only half of the screen's vertical resolution, where each line of the input is displayed twice. This is done using a chained list of DMA descriptors, one descriptor per output line. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 93 +-- 1 file changed, 87 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 2761478b16e8..01d8490393d1 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -66,6 +66,8 @@ struct jz_soc_info { struct ingenic_gem_object { struct drm_gem_cma_object base; + struct ingenic_dma_hwdesc *hwdescs; + dma_addr_t hwdescs_phys; }; struct ingenic_drm_private_state { @@ -73,6 +75,23 @@ struct ingenic_drm_private_state { bool no_vblank; bool use_palette; + + /* +* A lot of devices with an Ingenic SoC have a weird LCD panel attached, +* where the pixels are not square. For instance, the AUO A030JTN01 and +* Innolux EJ030NA panels have a resolution of 320x480 with a 4:3 aspect +* ratio. +* +* All userspace applications are built with the assumption that the +* pixels are square. To be able to support these devices without too +* much effort, add a doublescan feature, which allows the f0 and f1 +* planes to be used with only half of the screen's vertical resolution, +* where each line of the input is displayed twice. +* +* This is done using a chained list of DMA descriptors, one descriptor +* per output line. +*/ + bool doublescan; }; struct ingenic_drm { @@ -465,7 +484,7 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, return PTR_ERR(priv_state); ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state, - DRM_PLANE_HELPER_NO_SCALING, + 0x8000, DRM_PLANE_HELPER_NO_SCALING, priv->soc_info->has_osd, true); @@ -482,6 +501,17 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, (new_plane_state->src_h >> 16) != new_plane_state->crtc_h)) return -EINVAL; + /* Enable doublescan if the CRTC_H is twice the SRC_H. */ + priv_state->doublescan = (new_plane_state->src_h >> 16) * 2 == new_plane_state->crtc_h; + + /* Otherwise, fail if CRTC_H != SRC_H */ + if (!priv_state->doublescan && (new_plane_state->src_h >> 16) != new_plane_state->crtc_h) + return -EINVAL; + + /* Fail if CRTC_W != SRC_W */ + if ((new_plane_state->src_w >> 16) != new_plane_state->crtc_w) + return -EINVAL; + priv_state->use_palette = new_plane_state->fb && new_plane_state->fb->format->format == DRM_FORMAT_C8; @@ -647,7 +677,9 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct ingenic_drm_private_state *priv_state; struct drm_crtc_state *crtc_state; struct ingenic_dma_hwdesc *hwdesc; - unsigned int width, height, cpp; + unsigned int width, height, cpp, i; + struct drm_gem_object *gem_obj; + struct ingenic_gem_object *obj; dma_addr_t addr, next_addr; bool use_f1; u32 fourcc; @@ -664,17 +696,39 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, height = newstate->src_h >> 16; cpp = newstate->fb->format->cpp[0]; + gem_obj = drm_gem_fb_get_obj(newstate->fb, 0); + obj = to_ingenic_gem_obj(gem_obj); + priv_state = ingenic_drm_get_new_priv_state(priv, state); if (priv_state && priv_state->use_palette) next_addr = dma_hwdesc_pal_addr(priv); else next_addr = dma_hwdesc_addr(priv, use_f1); - hwdesc = &priv->dma_hwdescs->hwdesc[use_f1]; + if (priv_state->doublescan) { + hwdesc = &obj->hwdescs[0]; + /* +* Use one DMA descriptor per output line, and display +* each input line twice. +*/ +
[PATCH 08/11] drm/ingenic: Support custom GEM object
Add boilerplate code to support a custom "ingenic_gem_object". Empty for now, but it will be useful later when subsequent patches will introduce object-specific driver data. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index ced2109e8f35..1cac369f6293 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -64,6 +64,10 @@ struct jz_soc_info { unsigned int num_formats_f0, num_formats_f1; }; +struct ingenic_gem_object { + struct drm_gem_cma_object base; +}; + struct ingenic_drm_private_state { struct drm_private_state base; @@ -179,6 +183,11 @@ static inline struct ingenic_drm *drm_nb_get_priv(struct notifier_block *nb) return container_of(nb, struct ingenic_drm, clock_nb); } +static inline struct ingenic_gem_object *to_ingenic_gem_obj(struct drm_gem_object *gem_obj) +{ + return container_of(gem_obj, struct ingenic_gem_object, base.base); +} + static inline dma_addr_t dma_hwdesc_addr(const struct ingenic_drm *priv, bool use_f1) { u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc[use_f1]); @@ -853,15 +862,15 @@ static struct drm_gem_object * ingenic_drm_gem_create_object(struct drm_device *drm, size_t size) { struct ingenic_drm *priv = drm_device_get_priv(drm); - struct drm_gem_cma_object *obj; + struct ingenic_gem_object *obj; obj = kzalloc(sizeof(*obj), GFP_KERNEL); if (!obj) return ERR_PTR(-ENOMEM); - obj->map_noncoherent = priv->soc_info->map_noncoherent; + obj->base.map_noncoherent = priv->soc_info->map_noncoherent; - return &obj->base; + return &obj->base.base; } static struct drm_private_state * -- 2.30.2
[PATCH 09/11] drm/ingenic: Add ingenic_drm_gem_fb_destroy() function
Add a ingenic_drm_gem_fb_destroy() function, which currently only calls gem_fb_destroy(), but will be extended in a subsequent patch. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 1cac369f6293..2761478b16e8 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -846,16 +846,38 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc) regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0); } +static void ingenic_drm_gem_fb_destroy(struct drm_framebuffer *fb) +{ + drm_gem_fb_destroy(fb); +} + +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = { + .destroy= ingenic_drm_gem_fb_destroy, + .create_handle = drm_gem_fb_create_handle, +}; + +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs_dirty = { + .destroy= ingenic_drm_gem_fb_destroy, + .create_handle = drm_gem_fb_create_handle, + .dirty = drm_atomic_helper_dirtyfb, +}; + static struct drm_framebuffer * ingenic_drm_gem_fb_create(struct drm_device *drm, struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd) { struct ingenic_drm *priv = drm_device_get_priv(drm); + const struct drm_framebuffer_funcs *fb_funcs; + struct drm_framebuffer *fb; if (priv->soc_info->map_noncoherent) - return drm_gem_fb_create_with_dirty(drm, file, mode_cmd); + fb_funcs = &ingenic_drm_gem_fb_funcs_dirty; + else + fb_funcs = &ingenic_drm_gem_fb_funcs; + + fb = drm_gem_fb_create_with_funcs(drm, file, mode_cmd, fb_funcs); - return drm_gem_fb_create(drm, file, mode_cmd); + return fb; } static struct drm_gem_object * -- 2.30.2
[PATCH 07/11] drm/ingenic: Upload palette before frame
When using C8 color mode, make sure that the palette is always uploaded before a frame; otherwise the very first frame will have wrong colors. Do that by changing the link order of the DMA descriptors. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 45 ++- 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 5ba3283da97d..ced2109e8f35 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -68,6 +68,7 @@ struct ingenic_drm_private_state { struct drm_private_state base; bool no_vblank; + bool use_palette; }; struct ingenic_drm { @@ -185,6 +186,13 @@ static inline dma_addr_t dma_hwdesc_addr(const struct ingenic_drm *priv, bool us return priv->dma_hwdescs_phys + offset; } +static inline dma_addr_t dma_hwdesc_pal_addr(const struct ingenic_drm *priv) +{ + u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc_pal); + + return priv->dma_hwdescs_phys + offset; +} + static int ingenic_drm_update_pixclk(struct notifier_block *nb, unsigned long action, void *data) @@ -207,11 +215,19 @@ static void ingenic_drm_crtc_atomic_enable(struct drm_crtc *crtc, struct drm_atomic_state *state) { struct ingenic_drm *priv = drm_crtc_get_priv(crtc); + struct ingenic_drm_private_state *priv_state; + + priv_state = ingenic_drm_get_new_priv_state(priv, state); + if (WARN_ON(!priv_state)) + return; regmap_write(priv->map, JZ_REG_LCD_STATE, 0); /* Set address of our DMA descriptor chain */ - regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 0)); + if (priv_state->use_palette) + regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_pal_addr(priv)); + else + regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 0)); regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_addr(priv, 1)); regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, @@ -422,6 +438,7 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); struct ingenic_drm *priv = drm_device_get_priv(plane->dev); + struct ingenic_drm_private_state *priv_state; struct drm_crtc_state *crtc_state; struct drm_crtc *crtc = new_plane_state->crtc ?: old_plane_state->crtc; int ret; @@ -434,6 +451,10 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, if (WARN_ON(!crtc_state)) return -EINVAL; + priv_state = ingenic_drm_get_priv_state(priv, state); + if (IS_ERR(priv_state)) + return PTR_ERR(priv_state); + ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state, DRM_PLANE_HELPER_NO_SCALING, DRM_PLANE_HELPER_NO_SCALING, @@ -452,6 +473,9 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane, (new_plane_state->src_h >> 16) != new_plane_state->crtc_h)) return -EINVAL; + priv_state->use_palette = new_plane_state->fb && + new_plane_state->fb->format->format == DRM_FORMAT_C8; + /* * Require full modeset if enabling or disabling a plane, or changing * its position, size or depth. @@ -611,10 +635,11 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct ingenic_drm *priv = drm_device_get_priv(plane->dev); struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane); struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane); + struct ingenic_drm_private_state *priv_state; struct drm_crtc_state *crtc_state; struct ingenic_dma_hwdesc *hwdesc; - unsigned int width, height, cpp, offset; - dma_addr_t addr; + unsigned int width, height, cpp; + dma_addr_t addr, next_addr; bool use_f1; u32 fourcc; @@ -630,23 +655,23 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, height = newstate->src_h >> 16; cpp = newstate->fb->format->cpp[0]; + priv_state = ingenic_drm_get_new_priv_state(priv, state); + if (priv_state && priv_state->use_palette) + next_addr = dma_hwdesc_pal_addr(priv); + else + next_addr = dma_hwdesc_addr(priv, use_f1); + hwdesc = &priv->dma_hwdescs->hwdesc[use_f1];
[PATCH 06/11] drm/ingenic: Set DMA descriptor chain register when starting CRTC
Setting the DMA descriptor chain register in the probe function has been fine until now, because we only ever had one descriptor per foreground. As the driver will soon have real descriptor chains, and the DMA descriptor chain register updates itself to point to the current descriptor being processed, this register needs to be reset after a full modeset to point to the first descriptor of the chain. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 639994329c60..5ba3283da97d 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -210,6 +210,10 @@ static void ingenic_drm_crtc_atomic_enable(struct drm_crtc *crtc, regmap_write(priv->map, JZ_REG_LCD_STATE, 0); + /* Set address of our DMA descriptor chain */ + regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, 0)); + regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_addr(priv, 1)); + regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_ENABLE | JZ_LCD_CTRL_DISABLE, JZ_LCD_CTRL_ENABLE); @@ -1218,10 +1222,6 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) } } - /* Set address of our DMA descriptor chain */ - regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_phys_f0); - regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_phys_f1); - /* Enable OSD if available */ if (soc_info->has_osd) regmap_write(priv->map, JZ_REG_LCD_OSDC, JZ_LCD_OSDC_OSDEN); -- 2.30.2
[PATCH 05/11] drm/ingenic: Move IPU scale settings to private state
The IPU scaling information is computed in the plane's ".atomic_check" callback, and used in the ".atomic_update" callback. As such, it is state-specific, and should be moved to the private state structure. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-ipu.c | 73 --- 1 file changed, 54 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 007cd547b285..b85d9a7f53d3 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -47,6 +47,8 @@ struct soc_info { struct ingenic_ipu_private_state { struct drm_private_state base; + + unsigned int num_w, num_h, denom_w, denom_h; }; struct ingenic_ipu { @@ -58,8 +60,6 @@ struct ingenic_ipu { const struct soc_info *soc_info; bool clk_enabled; - unsigned int num_w, num_h, denom_w, denom_h; - dma_addr_t addr_y, addr_u, addr_v; struct drm_property *sharpness_prop; @@ -85,6 +85,30 @@ to_ingenic_ipu_priv_state(struct drm_private_state *state) return container_of(state, struct ingenic_ipu_private_state, base); } +static struct ingenic_ipu_private_state * +ingenic_ipu_get_priv_state(struct ingenic_ipu *priv, struct drm_atomic_state *state) +{ + struct drm_private_state *priv_state; + + priv_state = drm_atomic_get_private_obj_state(state, &priv->private_obj); + if (IS_ERR(priv_state)) + return ERR_CAST(priv_state); + + return to_ingenic_ipu_priv_state(priv_state); +} + +static struct ingenic_ipu_private_state * +ingenic_ipu_get_new_priv_state(struct ingenic_ipu *priv, struct drm_atomic_state *state) +{ + struct drm_private_state *priv_state; + + priv_state = drm_atomic_get_new_private_obj_state(state, &priv->private_obj); + if (!priv_state) + return NULL; + + return to_ingenic_ipu_priv_state(priv_state); +} + /* * Apply conventional cubic convolution kernel. Both parameters * and return value are 15.16 signed fixed-point. @@ -305,11 +329,16 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, const struct drm_format_info *finfo; u32 ctrl, stride = 0, coef_index = 0, format = 0; bool needs_modeset, upscaling_w, upscaling_h; + struct ingenic_ipu_private_state *ipu_state; int err; if (!newstate || !newstate->fb) return; + ipu_state = ingenic_ipu_get_new_priv_state(ipu, state); + if (WARN_ON(!ipu_state)) + return; + finfo = drm_format_info(newstate->fb->format->format); if (!ipu->clk_enabled) { @@ -482,27 +511,27 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, if (ipu->soc_info->has_bicubic) ctrl |= JZ_IPU_CTRL_ZOOM_SEL; - upscaling_w = ipu->num_w > ipu->denom_w; + upscaling_w = ipu_state->num_w > ipu_state->denom_w; if (upscaling_w) ctrl |= JZ_IPU_CTRL_HSCALE; - if (ipu->num_w != 1 || ipu->denom_w != 1) { + if (ipu_state->num_w != 1 || ipu_state->denom_w != 1) { if (!ipu->soc_info->has_bicubic && !upscaling_w) - coef_index |= (ipu->denom_w - 1) << 16; + coef_index |= (ipu_state->denom_w - 1) << 16; else - coef_index |= (ipu->num_w - 1) << 16; + coef_index |= (ipu_state->num_w - 1) << 16; ctrl |= JZ_IPU_CTRL_HRSZ_EN; } - upscaling_h = ipu->num_h > ipu->denom_h; + upscaling_h = ipu_state->num_h > ipu_state->denom_h; if (upscaling_h) ctrl |= JZ_IPU_CTRL_VSCALE; - if (ipu->num_h != 1 || ipu->denom_h != 1) { + if (ipu_state->num_h != 1 || ipu_state->denom_h != 1) { if (!ipu->soc_info->has_bicubic && !upscaling_h) - coef_index |= ipu->denom_h - 1; + coef_index |= ipu_state->denom_h - 1; else - coef_index |= ipu->num_h - 1; + coef_index |= ipu_state->num_h - 1; ctrl |= JZ_IPU_CTRL_VRSZ_EN; } @@ -513,13 +542,13 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane, /* Set the LUT index register */ regmap_write(ipu->map, JZ_REG_IPU_RSZ_COEF_INDEX, coef_index); - if (ipu->num_w != 1 || ipu->denom_w != 1) + if (ipu_state->num_w != 1 || ipu_state->denom_w != 1) ingenic_ipu_set_coefs(ipu, JZ_REG_IPU_HRSZ_COEF_LUT, - ipu->num_w, ipu->denom_w); + ipu_state->num_w, ipu_state->denom_w); - if (ipu->num_h != 1 || ipu->denom_h != 1) + if (ipu_state->num_h != 1 || ipu_state->denom_h != 1) ingenic_ipu_set_coefs(ipu, JZ_REG_IPU_VRSZ_COEF_LUT, -
[PATCH 04/11] drm/ingenic: Move no_vblank to private state
This information is carried from the ".atomic_check" to the ".atomic_commit_tail"; as such it is state-specific, and should be moved to the private state structure. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 41 --- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index e81084eb3b0e..639994329c60 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -66,6 +66,8 @@ struct jz_soc_info { struct ingenic_drm_private_state { struct drm_private_state base; + + bool no_vblank; }; struct ingenic_drm { @@ -87,7 +89,6 @@ struct ingenic_drm { dma_addr_t dma_hwdescs_phys; bool panel_is_sharp; - bool no_vblank; /* * clk_mutex is used to synchronize the pixel clock rate update with @@ -113,6 +114,30 @@ to_ingenic_drm_priv_state(struct drm_private_state *state) return container_of(state, struct ingenic_drm_private_state, base); } +static struct ingenic_drm_private_state * +ingenic_drm_get_priv_state(struct ingenic_drm *priv, struct drm_atomic_state *state) +{ + struct drm_private_state *priv_state; + + priv_state = drm_atomic_get_private_obj_state(state, &priv->private_obj); + if (IS_ERR(priv_state)) + return ERR_CAST(priv_state); + + return to_ingenic_drm_priv_state(priv_state); +} + +static struct ingenic_drm_private_state * +ingenic_drm_get_new_priv_state(struct ingenic_drm *priv, struct drm_atomic_state *state) +{ + struct drm_private_state *priv_state; + + priv_state = drm_atomic_get_new_private_obj_state(state, &priv->private_obj); + if (!priv_state) + return NULL; + + return to_ingenic_drm_priv_state(priv_state); +} + static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -268,6 +293,7 @@ static int ingenic_drm_crtc_atomic_check(struct drm_crtc *crtc, crtc); struct ingenic_drm *priv = drm_crtc_get_priv(crtc); struct drm_plane_state *f1_state, *f0_state, *ipu_state = NULL; + struct ingenic_drm_private_state *priv_state; if (crtc_state->gamma_lut && drm_color_lut_size(crtc_state->gamma_lut) != ARRAY_SIZE(priv->dma_hwdescs->palette)) { @@ -299,9 +325,13 @@ static int ingenic_drm_crtc_atomic_check(struct drm_crtc *crtc, } } + priv_state = ingenic_drm_get_priv_state(priv, state); + if (IS_ERR(priv_state)) + return PTR_ERR(priv_state); + /* If all the planes are disabled, we won't get a VBLANK IRQ */ - priv->no_vblank = !f1_state->fb && !f0_state->fb && - !(ipu_state && ipu_state->fb); + priv_state->no_vblank = !f1_state->fb && !f0_state->fb && + !(ipu_state && ipu_state->fb); } return 0; @@ -727,6 +757,7 @@ static void ingenic_drm_atomic_helper_commit_tail(struct drm_atomic_state *old_s */ struct drm_device *dev = old_state->dev; struct ingenic_drm *priv = drm_device_get_priv(dev); + struct ingenic_drm_private_state *priv_state; drm_atomic_helper_commit_modeset_disables(dev, old_state); @@ -736,7 +767,9 @@ static void ingenic_drm_atomic_helper_commit_tail(struct drm_atomic_state *old_s drm_atomic_helper_commit_hw_done(old_state); - if (!priv->no_vblank) + priv_state = ingenic_drm_get_new_priv_state(priv, old_state); + + if (!priv_state || !priv_state->no_vblank) drm_atomic_helper_wait_for_vblanks(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state); -- 2.30.2
[PATCH 03/11] drm/ingenic: Add support for private objects
Until now, the ingenic-drm as well as the ingenic-ipu drivers used to put state-specific information in their respective private structure. Add boilerplate code to support private objects in the two drivers, so that state-specific information can be put in the state-specific private structure. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 61 +++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 54 2 files changed, 115 insertions(+) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 4e41bdf2f3fd..e81084eb3b0e 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -64,6 +64,10 @@ struct jz_soc_info { unsigned int num_formats_f0, num_formats_f1; }; +struct ingenic_drm_private_state { + struct drm_private_state base; +}; + struct ingenic_drm { struct drm_device drm; /* @@ -99,8 +103,16 @@ struct ingenic_drm { struct mutex clk_mutex; bool update_clk_rate; struct notifier_block clock_nb; + + struct drm_private_obj private_obj; }; +static inline struct ingenic_drm_private_state * +to_ingenic_drm_priv_state(struct drm_private_state *state) +{ + return container_of(state, struct ingenic_drm_private_state, base); +} + static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg) { switch (reg) { @@ -790,6 +802,28 @@ ingenic_drm_gem_create_object(struct drm_device *drm, size_t size) return &obj->base; } +static struct drm_private_state * +ingenic_drm_duplicate_state(struct drm_private_obj *obj) +{ + struct ingenic_drm_private_state *state = to_ingenic_drm_priv_state(obj->state); + + state = kmemdup(state, sizeof(*state), GFP_KERNEL); + if (!state) + return NULL; + + __drm_atomic_helper_private_obj_duplicate_state(obj, &state->base); + + return &state->base; +} + +static void ingenic_drm_destroy_state(struct drm_private_obj *obj, + struct drm_private_state *state) +{ + struct ingenic_drm_private_state *priv_state = to_ingenic_drm_priv_state(state); + + kfree(priv_state); +} + DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops); static const struct drm_driver ingenic_drm_driver_data = { @@ -863,6 +897,11 @@ static struct drm_mode_config_helper_funcs ingenic_drm_mode_config_helpers = { .atomic_commit_tail = ingenic_drm_atomic_helper_commit_tail, }; +static const struct drm_private_state_funcs ingenic_drm_private_state_funcs = { + .atomic_duplicate_state = ingenic_drm_duplicate_state, + .atomic_destroy_state = ingenic_drm_destroy_state, +}; + static void ingenic_drm_unbind_all(void *d) { struct ingenic_drm *priv = d; @@ -875,9 +914,15 @@ static void __maybe_unused ingenic_drm_release_rmem(void *d) of_reserved_mem_device_release(d); } +static void ingenic_drm_atomic_private_obj_fini(struct drm_device *drm, void *private_obj) +{ + drm_atomic_private_obj_fini(private_obj); +} + static int ingenic_drm_bind(struct device *dev, bool has_components) { struct platform_device *pdev = to_platform_device(dev); + struct ingenic_drm_private_state *private_state; const struct jz_soc_info *soc_info; struct ingenic_drm *priv; struct clk *parent_clk; @@ -1158,6 +1203,20 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) goto err_devclk_disable; } + private_state = kzalloc(sizeof(*private_state), GFP_KERNEL); + if (!private_state) { + ret = -ENOMEM; + goto err_clk_notifier_unregister; + } + + drm_atomic_private_obj_init(drm, &priv->private_obj, &private_state->base, + &ingenic_drm_private_state_funcs); + + ret = drmm_add_action_or_reset(drm, ingenic_drm_atomic_private_obj_fini, + &priv->private_obj); + if (ret) + goto err_private_state_free; + ret = drm_dev_register(drm, 0); if (ret) { dev_err(dev, "Failed to register DRM driver\n"); @@ -1168,6 +1227,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) return 0; +err_private_state_free: + kfree(private_state); err_clk_notifier_unregister: clk_notifier_unregister(parent_clk, &priv->clock_nb); err_devclk_disable: diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c index 61b6d9fdbba1..007cd547b285 100644 --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c @@ -45,6 +45,10 @@ struct soc_info { unsigned int weight, unsigned int offset); }; +struct ingenic_ipu_private_state { + struct drm_private_state base; +}; + struct ingenic_ipu { struct d
[PATCH 02/11] drm/ingenic: Simplify code by using hwdescs array
Instead of having one 'hwdesc' variable for the plane #0 and one for the plane #1, use a 'hwdesc[2]' array, where the DMA hardware descriptors are indexed by the plane's number. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 38 --- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 93c099e7464d..4e41bdf2f3fd 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -50,8 +50,7 @@ struct ingenic_dma_hwdesc { } __aligned(16); struct ingenic_dma_hwdescs { - struct ingenic_dma_hwdesc hwdesc_f0; - struct ingenic_dma_hwdesc hwdesc_f1; + struct ingenic_dma_hwdesc hwdesc[2]; struct ingenic_dma_hwdesc hwdesc_pal; u16 palette[256] __aligned(16); }; @@ -142,6 +141,13 @@ static inline struct ingenic_drm *drm_nb_get_priv(struct notifier_block *nb) return container_of(nb, struct ingenic_drm, clock_nb); } +static inline dma_addr_t dma_hwdesc_addr(const struct ingenic_drm *priv, bool use_f1) +{ + u32 offset = offsetof(struct ingenic_dma_hwdescs, hwdesc[use_f1]); + + return priv->dma_hwdescs_phys + offset; +} + static int ingenic_drm_update_pixclk(struct notifier_block *nb, unsigned long action, void *data) @@ -563,6 +569,7 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, struct ingenic_dma_hwdesc *hwdesc; unsigned int width, height, cpp, offset; dma_addr_t addr; + bool use_f1; u32 fourcc; if (newstate && newstate->fb) { @@ -570,16 +577,14 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, drm_fb_cma_sync_non_coherent(&priv->drm, oldstate, newstate); crtc_state = newstate->crtc->state; + use_f1 = priv->soc_info->has_osd && plane != &priv->f0; addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0); width = newstate->src_w >> 16; height = newstate->src_h >> 16; cpp = newstate->fb->format->cpp[0]; - if (!priv->soc_info->has_osd || plane == &priv->f0) - hwdesc = &priv->dma_hwdescs->hwdesc_f0; - else - hwdesc = &priv->dma_hwdescs->hwdesc_f1; + hwdesc = &priv->dma_hwdescs->hwdesc[use_f1]; hwdesc->addr = addr; hwdesc->cmd = JZ_LCD_CMD_EOF_IRQ | (width * height * cpp / 4); @@ -592,9 +597,9 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane, if (fourcc == DRM_FORMAT_C8) offset = offsetof(struct ingenic_dma_hwdescs, hwdesc_pal); else - offset = offsetof(struct ingenic_dma_hwdescs, hwdesc_f0); + offset = offsetof(struct ingenic_dma_hwdescs, hwdesc[0]); - priv->dma_hwdescs->hwdesc_f0.next = priv->dma_hwdescs_phys + offset; + priv->dma_hwdescs->hwdesc[0].next = priv->dma_hwdescs_phys + offset; crtc_state->color_mgmt_changed = fourcc == DRM_FORMAT_C8; } @@ -968,20 +973,17 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) /* Configure DMA hwdesc for foreground0 plane */ - dma_hwdesc_phys_f0 = priv->dma_hwdescs_phys - + offsetof(struct ingenic_dma_hwdescs, hwdesc_f0); - priv->dma_hwdescs->hwdesc_f0.next = dma_hwdesc_phys_f0; - priv->dma_hwdescs->hwdesc_f0.id = 0xf0; + dma_hwdesc_phys_f0 = dma_hwdesc_addr(priv, 0); + priv->dma_hwdescs->hwdesc[0].next = dma_hwdesc_phys_f0; + priv->dma_hwdescs->hwdesc[0].id = 0xf0; /* Configure DMA hwdesc for foreground1 plane */ - dma_hwdesc_phys_f1 = priv->dma_hwdescs_phys - + offsetof(struct ingenic_dma_hwdescs, hwdesc_f1); - priv->dma_hwdescs->hwdesc_f1.next = dma_hwdesc_phys_f1; - priv->dma_hwdescs->hwdesc_f1.id = 0xf1; + dma_hwdesc_phys_f1 = dma_hwdesc_addr(priv, 1); + priv->dma_hwdescs->hwdesc[1].next = dma_hwdesc_phys_f1; + priv->dma_hwdescs->hwdesc[1].id = 0xf1; /* Configure DMA hwdesc for palette */ - priv->dma_hwdescs->hwdesc_pal.next = priv->dma_hwdescs_phys - + offsetof(struct ingenic_dma_hwdescs, hwdesc_f0); + priv->dma_hwdescs->hwdesc_pal.next = dma_hwdesc_phys_f0; priv->dma_hwdescs->hwdesc_pal.id = 0xc0; priv->dma_hwdescs->hwdesc_pal.addr = priv->dma_hwdescs_phys + offsetof(struct ingenic_dma_hwdescs, palette); -- 2.30.2
[PATCH 01/11] drm/ingenic: Remove dead code
The priv->ipu_plane would get a different value further down the code, without the first assigned value being read first; so the first assignation can be dropped. Signed-off-by: Paul Cercueil --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c index 5244f4763477..93c099e7464d 100644 --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -988,9 +988,6 @@ static int ingenic_drm_bind(struct device *dev, bool has_components) priv->dma_hwdescs->hwdesc_pal.cmd = JZ_LCD_CMD_ENABLE_PAL | (sizeof(priv->dma_hwdescs->palette) / 4); - if (soc_info->has_osd) - priv->ipu_plane = drm_plane_from_index(drm, 0); - primary = priv->soc_info->has_osd ? &priv->f1 : &priv->f0; drm_plane_helper_add(primary, &ingenic_drm_plane_helper_funcs); -- 2.30.2
[PATCH 00/11] ingenic-drm cleanups and doublescan feature
Hi, Here is a set of 11 patches for the ingenic-drm driver. Patches 1-7 are mostly generic cleanups, which will grease up the way for bigger changes to be introduced. Patch 3 adds support for a private state structure, which is then used to store state-specific information, which was previously stored in the driver's private structure directly. Patch 10 is the big one; it adds a double-scan feature emulated with DMA descriptors. This trick makes it possible to support a handful of boards which have strange panels with non-square pixels (320x480 4:3). Patch 11 updates the driver to support one top-level bridge per encoder, as it seems to be the norm now. Cheers, -Paul Paul Cercueil (11): drm/ingenic: Remove dead code drm/ingenic: Simplify code by using hwdescs array drm/ingenic: Add support for private objects drm/ingenic: Move no_vblank to private state drm/ingenic: Move IPU scale settings to private state drm/ingenic: Set DMA descriptor chain register when starting CRTC drm/ingenic: Upload palette before frame drm/ingenic: Support custom GEM object drm/ingenic: Add ingenic_drm_gem_fb_destroy() function drm/ingenic: Add doublescan feature drm/ingenic: Attach bridge chain to encoders drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 414 ++ drivers/gpu/drm/ingenic/ingenic-ipu.c | 127 ++- 2 files changed, 458 insertions(+), 83 deletions(-) -- 2.30.2
Re: [v1] drm/msm/disp/dpu1: avoid perf update in frame done event
Hi, On Wed, May 26, 2021 at 10:08 PM Krishna Manikandan wrote: > > Crtc perf update from frame event work can result in > wrong bandwidth and clock update from dpu if the work > is scheduled after the swap state has happened. > > Avoid such issues by moving perf update to complete > commit once the frame is accepted by the hardware. > > Fixes: a29c8c024165 ("drm/msm/disp/dpu1: fix display underruns during > modeset") > Signed-off-by: Krishna Manikandan > --- > drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) I don't know much about this code or any of the theory behind it, but I can confirm that this fixes the hang I was seeing with the previous patch. On sc7180-trogdor-lazor: Tested-by: Douglas Anderson
[RFC PATCH 5/5] mm: changes to unref pages with Generic type
From: Alex Sierra pages in device mapping refcounts are 1-based, instead of 0-based. If refcount 1, means it can be freed. This logic is not set for Generic memory type. Therefore, its release is threated as a normal page, instead of the callback device driver release it. Signed-off-by: Alex Sierra --- include/linux/mm.h | 1 + mm/memremap.c | 5 - 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1af7b9b76948..83bd2f3e111b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1130,6 +1130,7 @@ static inline bool page_is_devmap_managed(struct page *page) switch (page->pgmap->type) { case MEMORY_DEVICE_PRIVATE: case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_GENERIC: return true; default: break; diff --git a/mm/memremap.c b/mm/memremap.c index 16b2fb482da1..d2563fbcf987 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -44,6 +44,7 @@ EXPORT_SYMBOL(devmap_managed_key); static void devmap_managed_enable_put(struct dev_pagemap *pgmap) { if (pgmap->type == MEMORY_DEVICE_PRIVATE || + pgmap->type == MEMORY_DEVICE_GENERIC || pgmap->type == MEMORY_DEVICE_FS_DAX) static_branch_dec(&devmap_managed_key); } @@ -51,6 +52,7 @@ static void devmap_managed_enable_put(struct dev_pagemap *pgmap) static void devmap_managed_enable_get(struct dev_pagemap *pgmap) { if (pgmap->type == MEMORY_DEVICE_PRIVATE || + pgmap->type == MEMORY_DEVICE_GENERIC || pgmap->type == MEMORY_DEVICE_FS_DAX) static_branch_inc(&devmap_managed_key); } @@ -480,7 +482,8 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap); void free_devmap_managed_page(struct page *page) { /* notify page idle for dax */ - if (!is_device_private_page(page)) { + if (!(is_device_private_page(page) || + is_device_generic_page(page))) { wake_up_var(&page->_refcount); return; } -- 2.31.1
[RFC PATCH 4/5] mm: add generic type support for device zone page migration
From: Alex Sierra This support is only for generic type anonymous memory. Generic type with zone device pages require to take an extra reference, as it's done with device private type. Also, support added to migrate pages meta-data for generic device type. Signed-off-by: Alex Sierra --- mm/migrate.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 20ca887ea769..33e573a992e5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -380,7 +380,8 @@ static int expected_page_refs(struct address_space *mapping, struct page *page) * Device private pages have an extra refcount as they are * ZONE_DEVICE pages. */ - expected_count += is_device_private_page(page); + expected_count += + (is_device_private_page(page) || is_device_generic_page(page)); if (mapping) expected_count += thp_nr_pages(page) + page_has_private(page); @@ -2607,7 +2608,7 @@ static bool migrate_vma_check_page(struct page *page) * FIXME proper solution is to rework migration_entry_wait() so * it does not need to take a reference on page. */ - return is_device_private_page(page); + return is_device_private_page(page) | is_device_generic_page(page); } /* For file back page */ @@ -3069,10 +3070,12 @@ void migrate_vma_pages(struct migrate_vma *migrate) mapping = page_mapping(page); if (is_zone_device_page(newpage)) { - if (is_device_private_page(newpage)) { + if (is_device_private_page(newpage) || + is_device_generic_page(newpage)) { /* -* For now only support private anonymous when -* migrating to un-addressable device memory. +* For now only support private and devdax/generic +* anonymous when migrating to un-addressable +* device memory. */ if (mapping) { migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; -- 2.31.1
[RFC PATCH 3/5] include/linux/mm.h: helper to check zone device generic type
From: Alex Sierra Helper to check if zone device page is generic type. Signed-off-by: Alex Sierra --- include/linux/mm.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index c9900aedc195..1af7b9b76948 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1158,6 +1158,13 @@ static inline bool is_device_private_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PRIVATE; } +static inline bool is_device_generic_page(const struct page *page) +{ + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && + is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_GENERIC; +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && -- 2.31.1
[RFC PATCH 2/5] drm/amdkfd: generic type as sys mem on migration to ram
From: Alex Sierra Generic device type memory on VRAM to RAM migration, has similar access as System RAM from the CPU. This flag sets the source from the sender. Which in Generic type case, should be set as SYSTEM. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index f5939449a99f..7b41006c1164 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -653,8 +653,9 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange, migrate.vma = vma; migrate.start = start; migrate.end = end; - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); + migrate.flags = adev->gmc.xgmi.connected_to_cpu ? + MIGRATE_VMA_SELECT_SYSTEM : MIGRATE_VMA_SELECT_DEVICE_PRIVATE; size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t); size *= npages; -- 2.31.1
[RFC PATCH 1/5] drm/amdkfd: add SPM support for SVM
From: Alex Sierra When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_GENERIC to create the device page map region. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +--- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 1 - kernel/resource.c| 2 +- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index c8ca3252cbc2..f5939449a99f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -895,6 +895,7 @@ int svm_migrate_init(struct amdgpu_device *adev) struct resource *res; unsigned long size; void *r; + bool xgmi_connected_to_cpu = adev->gmc.xgmi.connected_to_cpu; /* Page migration works on Vega10 or newer */ if (kfddev->device_info->asic_family < CHIP_VEGA10) @@ -907,17 +908,22 @@ int svm_migrate_init(struct amdgpu_device *adev) * should remove reserved size */ size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20); - res = devm_request_free_mem_region(adev->dev, &iomem_resource, size); + if (xgmi_connected_to_cpu) + res = lookup_resource(&iomem_resource, adev->gmc.aper_base); + else + res = devm_request_free_mem_region(adev->dev, &iomem_resource, size); + if (IS_ERR(res)) return -ENOMEM; - pgmap->type = MEMORY_DEVICE_PRIVATE; pgmap->nr_range = 1; pgmap->range.start = res->start; pgmap->range.end = res->end; + pgmap->type = xgmi_connected_to_cpu ? + MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE; pgmap->ops = &svm_migrate_pgmap_ops; pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev); - pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + pgmap->flags = 0; r = devm_memremap_pages(adev->dev, pgmap); if (IS_ERR(r)) { pr_err("failed to register HMM device memory\n"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 21f693767a0d..3881a93192ed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -38,7 +38,6 @@ #define SVM_RANGE_VRAM_DOMAIN (1UL << 0) #define SVM_ADEV_PGMAP_OWNER(adev)\ ((adev)->hive ? (void *)(adev)->hive : (void *)(adev)) - struct svm_range_bo { struct amdgpu_bo*bo; struct kref kref; diff --git a/kernel/resource.c b/kernel/resource.c index 627e61b0c124..da137553b83e 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -783,7 +783,7 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start) return res; } - +EXPORT_SYMBOL(lookup_resource); /* * Insert a resource into the resource tree. If successful, return NULL, * otherwise return the conflicting resource (compare to __request_resource()) -- 2.31.1
[RFC PATCH 0/5] Support DEVICE_GENERIC memory in migrate_vma_*
AMD is building a system architecture for the Frontier supercomputer with a coherent interconnect between CPUs and GPUs. This hardware architecture allows the CPUs to coherently access GPU device memory. We have hardware in our labs and we are working with our partner HPE on the BIOS, firmware and software for delivery to the DOE. The system BIOS advertises the GPU device memory (aka VRAM) as SPM (special purpose memory) in the UEFI system address map. The amdgpu driver looks it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC using devm_memremap_pages. Now we're trying to migrate data to and from that memory using the migrate_vma_* helpers so we can support page-based migration in our unified memory allocations, while also supporting CPU access to those pages. This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave correctly in the migrate_vma_* helpers. We are looking for feedback about this approach. If we're close, what's needed to make our patches acceptable upstream? If we're not close, any suggestions how else to achieve what we are trying to do (i.e. page migration and coherent CPU access to VRAM)? This work is based on HMM and our SVM memory manager that was recently upstreamed to Dave Airlie's drm-next branch [https://cgit.freedesktop.org/drm/drm/log/?h=drm-next]. On top of that we did some rework of our VRAM management for migrations to remove some incorrect assumptions, allow partially successful migrations and GPU memory mappings that mix pages in VRAM and system memory. [https://patchwork.kernel.org/project/dri-devel/list/?series=489811] In this RFC, patches 1 and 2 are for context to show how we are looking up the SPM memory and registering it with devmap. Patches 3-5 are the changes we are trying to upstream or rework to make them acceptable upstream. Alex Sierra (5): drm/amdkfd: add SPM support for SVM drm/amdkfd: generic type as sys mem on migration to ram include/linux/mm.h: helper to check zone device generic type mm: add generic type support for device zone page migration mm: changes to unref pages with Generic type drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 +++ drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 1 - include/linux/mm.h | 8 kernel/resource.c| 2 +- mm/memremap.c| 5 - mm/migrate.c | 13 - 6 files changed, 32 insertions(+), 12 deletions(-) -- 2.31.1
[PATCH] drm: Fix for GEM buffers with write-combine memory
The previous commit wrongly assumed that dma_mmap_wc() could be replaced by pgprot_writecombine() + dma_mmap_pages(). It did work on my setup, but did not work everywhere. Use dma_mmap_wc() when the buffer has the write-combine cache attribute, and dma_mmap_pages() when it has the non-coherent cache attribute. Signed-off-by: Paul Cercueil Reported-by: Tomi Valkeinen Fixes: cf8ccbc72d61 ("drm: Add support for GEM buffers backed by non-coherent memory") --- drivers/gpu/drm/drm_gem_cma_helper.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index 235c7a63da2b..4c3772651954 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -514,13 +514,17 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) cma_obj = to_drm_gem_cma_obj(obj); - vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); - if (!cma_obj->map_noncoherent) - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + if (cma_obj->map_noncoherent) { + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); + + ret = dma_mmap_pages(cma_obj->base.dev->dev, +vma, vma->vm_end - vma->vm_start, +virt_to_page(cma_obj->vaddr)); + } else { + ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr, + cma_obj->paddr, vma->vm_end - vma->vm_start); - ret = dma_mmap_pages(cma_obj->base.dev->dev, -vma, vma->vm_end - vma->vm_start, -virt_to_page(cma_obj->vaddr)); + } if (ret) drm_gem_vm_close(vma); -- 2.30.2
Re: [Freedreno] [PATCH] drm/msm: fix display snapshotting if DP or DSI is disabled
On 2021-05-27 15:03, Dmitry Baryshkov wrote: Fix following warnings generated when either DP or DSI support is disabled: drivers/gpu/drm/msm/disp/msm_disp_snapshot_util.c:141:3: error: implicit declaration of function 'msm_dp_snapshot'; did you mean 'msm_dsi_snapshot'? [-Werror=implicit-function-declaration] drivers/gpu/drm/msm/msm_kms.h:127:26: warning: 'struct msm_disp_state' declared inside parameter list will not be visible outside of this definition or declaration drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c:867:21: error: initialization of 'void (*)(struct msm_disp_state *, struct msm_kms *)' from incompatible pointer type 'void (*)(struct msm_disp_state *, struct msm_kms *)' [-Werror=incompatible-pointer-types] drivers/gpu/drm/msm/dsi/dsi.h:94:30: warning: 'struct msm_disp_state' declared inside parameter list will not be visible outside of this definition or declaration Reported-by: kernel test robot Cc: Abhinav Kumar Fixes: 1c3b7ac1a71d ("drm/msm: pass dump state as a function argument") Signed-off-by: Dmitry Baryshkov Reviewed-by: Abhinav Kumar --- drivers/gpu/drm/msm/disp/msm_disp_snapshot.h | 1 - drivers/gpu/drm/msm/dsi/dsi.h| 2 -- drivers/gpu/drm/msm/msm_drv.h| 12 +++- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h index c6174a366095..c92a9508c8d3 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h @@ -27,7 +27,6 @@ #include #include #include "msm_kms.h" -#include "dsi.h" #define MSM_DISP_SNAPSHOT_MAX_BLKS 10 diff --git a/drivers/gpu/drm/msm/dsi/dsi.h b/drivers/gpu/drm/msm/dsi/dsi.h index cea73f9c4be9..9b8e9b07eced 100644 --- a/drivers/gpu/drm/msm/dsi/dsi.h +++ b/drivers/gpu/drm/msm/dsi/dsi.h @@ -91,8 +91,6 @@ static inline bool msm_dsi_device_connected(struct msm_dsi *msm_dsi) return msm_dsi->panel || msm_dsi->external_bridge; } -void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi); - struct drm_encoder *msm_dsi_get_encoder(struct msm_dsi *msm_dsi); /* dsi host */ diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index c33fc1293789..ba60bf6f124c 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -43,6 +43,7 @@ struct msm_gem_submit; struct msm_fence_context; struct msm_gem_address_space; struct msm_gem_vma; +struct msm_disp_state; #define MAX_CRTCS 8 #define MAX_PLANES 20 @@ -340,6 +341,8 @@ void __init msm_dsi_register(void); void __exit msm_dsi_unregister(void); int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device *dev, struct drm_encoder *encoder); +void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi); + #else static inline void __init msm_dsi_register(void) { @@ -353,6 +356,10 @@ static inline int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, { return -EINVAL; } +static inline void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi) +{ +} + #endif #ifdef CONFIG_DRM_MSM_DP @@ -367,7 +374,6 @@ void msm_dp_display_mode_set(struct msm_dp *dp, struct drm_encoder *encoder, struct drm_display_mode *mode, struct drm_display_mode *adjusted_mode); void msm_dp_irq_postinstall(struct msm_dp *dp_display); -struct msm_disp_state; void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp *dp_display); void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor); @@ -412,6 +418,10 @@ static inline void msm_dp_irq_postinstall(struct msm_dp *dp_display) { } +static inline void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp *dp_display) +{ +} + static inline void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor) {
[PATCH] drm/msm: fix display snapshotting if DP or DSI is disabled
Fix following warnings generated when either DP or DSI support is disabled: drivers/gpu/drm/msm/disp/msm_disp_snapshot_util.c:141:3: error: implicit declaration of function 'msm_dp_snapshot'; did you mean 'msm_dsi_snapshot'? [-Werror=implicit-function-declaration] drivers/gpu/drm/msm/msm_kms.h:127:26: warning: 'struct msm_disp_state' declared inside parameter list will not be visible outside of this definition or declaration drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c:867:21: error: initialization of 'void (*)(struct msm_disp_state *, struct msm_kms *)' from incompatible pointer type 'void (*)(struct msm_disp_state *, struct msm_kms *)' [-Werror=incompatible-pointer-types] drivers/gpu/drm/msm/dsi/dsi.h:94:30: warning: 'struct msm_disp_state' declared inside parameter list will not be visible outside of this definition or declaration Reported-by: kernel test robot Cc: Abhinav Kumar Fixes: 1c3b7ac1a71d ("drm/msm: pass dump state as a function argument") Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/msm_disp_snapshot.h | 1 - drivers/gpu/drm/msm/dsi/dsi.h| 2 -- drivers/gpu/drm/msm/msm_drv.h| 12 +++- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h index c6174a366095..c92a9508c8d3 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.h @@ -27,7 +27,6 @@ #include #include #include "msm_kms.h" -#include "dsi.h" #define MSM_DISP_SNAPSHOT_MAX_BLKS 10 diff --git a/drivers/gpu/drm/msm/dsi/dsi.h b/drivers/gpu/drm/msm/dsi/dsi.h index cea73f9c4be9..9b8e9b07eced 100644 --- a/drivers/gpu/drm/msm/dsi/dsi.h +++ b/drivers/gpu/drm/msm/dsi/dsi.h @@ -91,8 +91,6 @@ static inline bool msm_dsi_device_connected(struct msm_dsi *msm_dsi) return msm_dsi->panel || msm_dsi->external_bridge; } -void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi); - struct drm_encoder *msm_dsi_get_encoder(struct msm_dsi *msm_dsi); /* dsi host */ diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index c33fc1293789..ba60bf6f124c 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -43,6 +43,7 @@ struct msm_gem_submit; struct msm_fence_context; struct msm_gem_address_space; struct msm_gem_vma; +struct msm_disp_state; #define MAX_CRTCS 8 #define MAX_PLANES 20 @@ -340,6 +341,8 @@ void __init msm_dsi_register(void); void __exit msm_dsi_unregister(void); int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device *dev, struct drm_encoder *encoder); +void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi); + #else static inline void __init msm_dsi_register(void) { @@ -353,6 +356,10 @@ static inline int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, { return -EINVAL; } +static inline void msm_dsi_snapshot(struct msm_disp_state *disp_state, struct msm_dsi *msm_dsi) +{ +} + #endif #ifdef CONFIG_DRM_MSM_DP @@ -367,7 +374,6 @@ void msm_dp_display_mode_set(struct msm_dp *dp, struct drm_encoder *encoder, struct drm_display_mode *mode, struct drm_display_mode *adjusted_mode); void msm_dp_irq_postinstall(struct msm_dp *dp_display); -struct msm_disp_state; void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp *dp_display); void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor); @@ -412,6 +418,10 @@ static inline void msm_dp_irq_postinstall(struct msm_dp *dp_display) { } +static inline void msm_dp_snapshot(struct msm_disp_state *disp_state, struct msm_dp *dp_display) +{ +} + static inline void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor) { -- 2.30.2
Linux Graphics Next: Userspace submission update
Hi, Since Christian believes that we can't deadlock the kernel with some changes there, we just need to make everything nice for userspace too. Instead of explaining how it will work, I will explain the cases where future hardware (and its kernel driver) will break existing userspace in order to protect everybody from deadlocks. Anything that uses implicit sync will be spared, so X and Wayland will be fine, assuming they don't import/export fences. Those use cases that do import/export fences might or might not work, depending on how the fences are used. One of the necessities is that all fences will become future fences. The semantics of imported/exported fences will change completely and will have new restrictions on the usage. The restrictions are: 1) Android sync files will be impossible to support, so won't be supported. (they don't allow future fences) 2) Implicit sync and explicit sync will be mutually exclusive between process. A process can either use one or the other, but not both. This is meant to prevent a deadlock condition with future fences where any process can malevolently deadlock execution of any other process, even execution of a higher-privileged process. The kernel will impose the following restrictions to protect against the deadlock: a) a process with an implicitly-sync'd imported/exported buffer can't import/export a fence from/to another process b) a process with an imported/exported fence can't import/export an implicitly-sync'd buffer from/to another process Alternative: A higher-privileged process could enforce both restrictions instead of the kernel to protect itself from the deadlock, but this would be a can of worms for existing userspace. It would be better if the kernel just broke unsafe userspace on future hw, just like sync files. If both implicit and explicit sync are allowed to occur simultaneously, sending a future fence that will never signal to any process will deadlock that process after it acquires the implicit sync lock, which is a sequence number that the process is required to write to memory and send an interrupt from the GPU in a finite time. This is how the deadlock can happen: * The process gets sequence number N from the kernel for an implicitly-sync'd buffer. * The process inserts (into the GPU user-mapped queue) a wait for sequence number N-1. * The process inserts a wait for a fence, but it doesn't know that it will never signal ==> deadlock. ... * The process inserts a command to write sequence number N to a predetermined memory location. (which will make the buffer idle and send an interrupt to the kernel) ... * The kernel will terminate the process because it has never received the interrupt. (i.e. a less-privileged process just killed a more-privileged process) It's the interrupt for implicit sync that never arrived that caused the termination, and the only way another process can cause it is by sending a fence that will never signal. Thus, importing/exporting fences from/to other processes can't be allowed simultaneously with implicit sync. 3) Compositors (and other privileged processes, and display flipping) can't trust imported/exported fences. They need a timeout recovery mechanism from the beginning, and the following are some possible solutions to timeouts: a) use a CPU wait with a small absolute timeout, and display the previous content on timeout b) use a GPU wait with a small absolute timeout, and conditional rendering will choose between the latest content (if signalled) and previous content (if timed out) The result would be that the desktop can run close to 60 fps even if an app runs at 1 fps. *Redefining imported/exported fences and breaking some users/OSs is the only way to have userspace GPU command submission, and the deadlock example here is the counterexample proving that there is no other way.* So, what are the chances this is going to fly with the ecosystem? Thanks, Marek
Re: [v4 1/4] drm/panel-simple: Add basic DPCD backlight support
Hi, On Thu, May 27, 2021 at 5:21 AM wrote: > > >> @@ -171,6 +172,19 @@ struct panel_desc { > >> > >> /** @connector_type: LVDS, eDP, DSI, DPI, etc. */ > >> int connector_type; > >> + > >> + /** > >> +* @uses_dpcd_backlight: Panel supports eDP dpcd backlight > >> control. > >> +* > >> +* Set true, if the panel supports backlight control over eDP > >> AUX channel > >> +* using DPCD registers as per VESA's standard. > >> +*/ > >> + bool uses_dpcd_backlight; > >> +}; > >> + > >> +struct edp_backlight { > >> + struct backlight_device *dev; > > > > Can you pick a name other than "dev". In my mind "dev" means you've > > got a "struct device" or a "struct device *". > > In the backlight.h "bd" is used for "struct backlight_device". I can use > "bd"? That would be OK w/ me since it's not "dev". In theory you could also call it "base" like panel-simple does with the base class drm_panel, but I'll leave that up to you. It's mostly that in my brain "dev" is reserved for "struct device" but otherwise I'm pretty flexible. > >> + struct drm_edp_backlight_info info; > >> }; > >> > >> struct panel_simple { > >> @@ -194,6 +208,8 @@ struct panel_simple { > >> > >> struct edid *edid; > >> > >> + struct edp_backlight *edp_bl; > >> + > > > > I don't think you need to add this pointer. See below for details, but > > basically the backlight device should be in base.backlight. Any code > > that needs the containing structure can use the standard > > "container_of" syntax. > > > > The documentation of the "struct drm_panel -> backlight" mentions > "backlight is set by drm_panel_of_backlight() and drivers shall not > assign it." > That's why I was not sure if I should touch that part. Because of this, > I added > backlight enable/disable calls inside panel_simple_disable/enable(). Fair enough. In my opinion (subject to being overridden by the adults in the room), if you move your backlight code into drm_panel.c and call it drm_panel_dp_aux_backlight() then it's fair game to use. This basically means that it's no longer a "driver" assigning it since it's being done in drm_panel.c. ;-) Obviously you'd want to update the comment, too... > >> + err = drm_panel_of_backlight(&panel->base); > >> + if (err) > >> + goto disable_pm_runtime; > >> + } > > > > See above where I'm suggesting some different logic. Specifically: > > always try the drm_panel_of_backlight() call and then fallback to the > > AUX backlight if "panel->base.backlight" is NULL and "panel->aux" is > > not NULL. > > What I understood: > 1. Create a new API drm_panel_dp_aux_backlight() in drm_panel.c > 1.1. Register DP AUX backlight if "struct drm_dp_aux" is given and > drm_edp_backlight_supported() > 2. Create a call back function for backlight ".update_status()" inside > drm_panel.c ? >This function should also handle the backlight enable/disable > operations. > 3. Use the suggested rules to call drm_panel_dp_aux_backlight() as a > fallback, if > no backlight is specified in the DT. > 4. Remove the @uses_dpcd_backlight flag from panel_desc as this should > be auto-detected. This sounds about right to me. As per all of my reviews in the DRM subsystem, this is all just my opinion and if someone more senior in DRM contradicts me then, of course, you might have to change directions. Hopefully that doesn't happen but it's always good to give warning... -Doug
Re: [PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace
> The main outstanding questing is the proper name. Performance monitoring > ("PERMON") is the name used by kbase, but it's jargon-y and risks > confusion with performance counters, an orthogonal mechanism. Cycle > count is more descriptive and matches the actual hardware name, but > obscures that the same mechanism is required for GPU timestamps. This > bit of bikeshedding aside, I'm pleased with the patches. PANFROST_JD_REQ_CLOCK might be the clearest.
[PATCH 10/10] drm/amdkfd: protect svm_bo ref in case prange has forked
From: Alex Sierra Keep track of all the pages inside of pranges referenced to the same svm_bo. This is done, by using the ref count inside this object. This makes sure the object has freed after the last prange is not longer at any GPU. Including references shared between a parent and child during a fork. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index acb9f64577a0..c8ca3252cbc2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -245,7 +245,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) struct page *page; page = pfn_to_page(pfn); - page->zone_device_data = prange; + page->zone_device_data = prange->svm_bo; get_page(page); lock_page(page); } @@ -336,6 +336,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange, svm_migrate_get_vram_page(prange, migrate->dst[i]); migrate->dst[i] = migrate_pfn(migrate->dst[i]); migrate->dst[i] |= MIGRATE_PFN_LOCKED; + svm_range_bo_ref(prange->svm_bo); } if (migrate->dst[i] & MIGRATE_PFN_VALID) { spage = migrate_pfn_to_page(migrate->src[i]); @@ -540,7 +541,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, static void svm_migrate_page_free(struct page *page) { - /* Keep this function to avoid warning */ + struct svm_range_bo *svm_bo = page->zone_device_data; + + if (svm_bo) { + pr_debug("svm_bo ref left: %d\n", kref_read(&svm_bo->kref)); + svm_range_bo_unref(svm_bo); + } } static int diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 1e15a6170635..2bc20752ee30 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -309,14 +309,6 @@ static bool svm_bo_ref_unless_zero(struct svm_range_bo *svm_bo) return true; } -static struct svm_range_bo *svm_range_bo_ref(struct svm_range_bo *svm_bo) -{ - if (svm_bo) - kref_get(&svm_bo->kref); - - return svm_bo; -} - static void svm_range_bo_release(struct kref *kref) { struct svm_range_bo *svm_bo; @@ -355,7 +347,7 @@ static void svm_range_bo_release(struct kref *kref) kfree(svm_bo); } -static void svm_range_bo_unref(struct svm_range_bo *svm_bo) +void svm_range_bo_unref(struct svm_range_bo *svm_bo) { if (!svm_bo) return; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 27fbe1936493..21f693767a0d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -150,6 +150,14 @@ static inline void svm_range_unlock(struct svm_range *prange) mutex_unlock(&prange->lock); } +static inline struct svm_range_bo *svm_range_bo_ref(struct svm_range_bo *svm_bo) +{ + if (svm_bo) + kref_get(&svm_bo->kref); + + return svm_bo; +} + int svm_range_list_init(struct kfd_process *p); void svm_range_list_fini(struct kfd_process *p); int svm_ioctl(struct kfd_process *p, enum kfd_ioctl_svm_op op, uint64_t start, @@ -178,7 +186,7 @@ void svm_range_dma_unmap(struct device *dev, dma_addr_t *dma_addr, void svm_range_free_dma_mappings(struct svm_range *prange); void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm, void *owner); - +void svm_range_bo_unref(struct svm_range_bo *svm_bo); #else struct kfd_process; -- 2.31.1
[PATCH 08/10] drm/amdkfd: add invalid pages debug at vram migration
From: Alex Sierra This is for debug purposes only. It conditionally generates partial migrations to test mixed CPU/GPU memory domain pages in a prange easily. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 8a3f21d76915..f71f8d7e2b72 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -404,6 +404,20 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange, } } +#ifdef DEBUG_FORCE_MIXED_DOMAINS + for (i = 0, j = 0; i < npages; i += 4, j++) { + if (j & 1) + continue; + svm_migrate_put_vram_page(adev, dst[i]); + migrate->dst[i] = 0; + svm_migrate_put_vram_page(adev, dst[i + 1]); + migrate->dst[i + 1] = 0; + svm_migrate_put_vram_page(adev, dst[i + 2]); + migrate->dst[i + 2] = 0; + svm_migrate_put_vram_page(adev, dst[i + 3]); + migrate->dst[i + 3] = 0; + } +#endif out: return r; } -- 2.31.1
[PATCH 09/10] drm/amdkfd: partially actual_loc removed
From: Alex Sierra actual_loc should not be used anymore, as pranges could have mixed locations (VRAM & SYSRAM) at the same time. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +--- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 71 ++-- 2 files changed, 29 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index f71f8d7e2b72..acb9f64577a0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -501,12 +501,6 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, struct amdgpu_device *adev; int r = 0; - if (prange->actual_loc == best_loc) { - pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n", -prange->svms, prange->start, prange->last, best_loc); - return 0; - } - adev = svm_range_get_adev_by_id(prange, best_loc); if (!adev) { pr_debug("failed to get device by id 0x%x\n", best_loc); @@ -791,11 +785,7 @@ int svm_migrate_to_vram(struct svm_range *prange, uint32_t best_loc, struct mm_struct *mm) { - if (!prange->actual_loc) - return svm_migrate_ram_to_vram(prange, best_loc, mm); - else - return svm_migrate_vram_to_vram(prange, best_loc, mm); - + return svm_migrate_ram_to_vram(prange, best_loc, mm); } /** diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 7b50395ec377..1e15a6170635 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1420,42 +1420,38 @@ static int svm_range_validate_and_map(struct mm_struct *mm, svm_range_reserve_bos(&ctx); - if (!prange->actual_loc) { - p = container_of(prange->svms, struct kfd_process, svms); - owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap, - MAX_GPU_INSTANCE)); - for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) { - if (kfd_svm_page_owner(p, idx) != owner) { - owner = NULL; - break; - } - } - r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, - prange->start << PAGE_SHIFT, - prange->npages, &hmm_range, - false, true, owner); - if (r) { - pr_debug("failed %d to get svm range pages\n", r); - goto unreserve_out; - } - - r = svm_range_dma_map(prange, ctx.bitmap, - hmm_range->hmm_pfns); - if (r) { - pr_debug("failed %d to dma map range\n", r); - goto unreserve_out; + p = container_of(prange->svms, struct kfd_process, svms); + owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap, + MAX_GPU_INSTANCE)); + for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) { + if (kfd_svm_page_owner(p, idx) != owner) { + owner = NULL; + break; } + } + r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, + prange->start << PAGE_SHIFT, + prange->npages, &hmm_range, + false, true, owner); + if (r) { + pr_debug("failed %d to get svm range pages\n", r); + goto unreserve_out; + } - prange->validated_once = true; + r = svm_range_dma_map(prange, ctx.bitmap, + hmm_range->hmm_pfns); + if (r) { + pr_debug("failed %d to dma map range\n", r); + goto unreserve_out; } + prange->validated_once = true; + svm_range_lock(prange); - if (!prange->actual_loc) { - if (amdgpu_hmm_range_get_pages_done(hmm_range)) { - pr_debug("hmm update the range, need validate again\n"); - r = -EAGAIN; - goto unlock_out; - } + if (amdgpu_hmm_range_get_pages_done(hmm_range)) { + pr_debug("hmm update the range, need validate again\n"); + r = -EAGAIN; + goto unlock_out; } if (!list_empty(&prange->child_list)) { pr_debug("range split by unmap in parallel, validate again\n"); @@ -2740,20 +2736,9 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange, *migrated = false;
[PATCH 07/10] drm/amdkfd: skip migration for pages already in VRAM
From: Alex Sierra Migration skipped for pages that are already in VRAM domain. These could be the result of previous partial migrations to SYS RAM, and prefetch back to VRAM. Ex. Coherent pages in VRAM that were not written/invalidated after a copy-on-write. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 6fd68528c425..8a3f21d76915 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -329,14 +329,15 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange, for (i = j = 0; i < npages; i++) { struct page *spage; - dst[i] = vram_addr + (j << PAGE_SHIFT); - migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]); - svm_migrate_get_vram_page(prange, migrate->dst[i]); - - migrate->dst[i] = migrate_pfn(migrate->dst[i]); - migrate->dst[i] |= MIGRATE_PFN_LOCKED; - - if (migrate->src[i] & MIGRATE_PFN_VALID) { + spage = migrate_pfn_to_page(migrate->src[i]); + if (spage && !is_zone_device_page(spage)) { + dst[i] = vram_addr + (j << PAGE_SHIFT); + migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]); + svm_migrate_get_vram_page(prange, migrate->dst[i]); + migrate->dst[i] = migrate_pfn(migrate->dst[i]); + migrate->dst[i] |= MIGRATE_PFN_LOCKED; + } + if (migrate->dst[i] & MIGRATE_PFN_VALID) { spage = migrate_pfn_to_page(migrate->src[i]); src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE, DMA_TO_DEVICE); -- 2.31.1
[PATCH 06/10] drm/amdkfd: skip invalid pages during migrations
From: Alex Sierra Invalid pages can be the result of pages that have been migrated already due to copy-on-write procedure or pages that were never migrated to VRAM in first place. This is not an issue anymore, as pranges now support mixed memory domains (CPU/GPU). Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 38 +++- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index b298aa8dea4d..6fd68528c425 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -419,7 +419,6 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange, size_t size; void *buf; int r = -ENOMEM; - int retry = 0; memset(&migrate, 0, sizeof(migrate)); migrate.vma = vma; @@ -438,7 +437,6 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange, migrate.dst = migrate.src + npages; scratch = (dma_addr_t *)(migrate.dst + npages); -retry: r = migrate_vma_setup(&migrate); if (r) { pr_debug("failed %d prepare migrate svms 0x%p [0x%lx 0x%lx]\n", @@ -446,17 +444,9 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange, goto out_free; } if (migrate.cpages != npages) { - pr_debug("collect 0x%lx/0x%llx pages, retry\n", migrate.cpages, + pr_debug("Partial migration. 0x%lx/0x%llx pages can be migrated\n", +migrate.cpages, npages); - migrate_vma_finalize(&migrate); - if (retry++ >= 3) { - r = -ENOMEM; - pr_debug("failed %d migrate svms 0x%p [0x%lx 0x%lx]\n", -r, prange->svms, prange->start, prange->last); - goto out_free; - } - - goto retry; } if (migrate.cpages) { @@ -547,9 +537,8 @@ static void svm_migrate_page_free(struct page *page) static int svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, struct migrate_vma *migrate, struct dma_fence **mfence, - dma_addr_t *scratch) + dma_addr_t *scratch, uint64_t npages) { - uint64_t npages = migrate->cpages; struct device *dev = adev->dev; uint64_t *src; dma_addr_t *dst; @@ -566,15 +555,23 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, src = (uint64_t *)(scratch + npages); dst = scratch; - for (i = 0, j = 0; i < npages; i++, j++, addr += PAGE_SIZE) { + for (i = 0, j = 0; i < npages; i++, addr += PAGE_SIZE) { struct page *spage; spage = migrate_pfn_to_page(migrate->src[i]); - if (!spage) { - pr_debug("failed get spage svms 0x%p [0x%lx 0x%lx]\n", + if (!spage || !is_zone_device_page(spage)) { + pr_debug("invalid page. Could be in CPU already svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start, prange->last); - r = -ENOMEM; - goto out_oom; + if (j) { + r = svm_migrate_copy_memory_gart(adev, dst + i - j, +src + i - j, j, + FROM_VRAM_TO_RAM, +mfence); + if (r) + goto out_oom; + j = 0; + } + continue; } src[i] = svm_migrate_addr(adev, spage); if (i > 0 && src[i] != src[i - 1] + PAGE_SIZE) { @@ -607,6 +604,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, migrate->dst[i] = migrate_pfn(page_to_pfn(dpage)); migrate->dst[i] |= MIGRATE_PFN_LOCKED; + j++; } r = svm_migrate_copy_memory_gart(adev, dst + i - j, src + i - j, j, @@ -664,7 +662,7 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange, if (migrate.cpages) { r = svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence, - scratch); + scratch, npages); migrate_vma_pages(&migrate); svm_migrate_copy_done(adev, mfence); migrate_vma_finalize(&migrate); -- 2.31.1
[PATCH 05/10] drm/amdkfd: classify and map mixed svm range pages in GPU
From: Alex Sierra [Why] svm ranges can have mixed pages from device or system memory. A good example is, after a prange has been allocated in VRAM and a copy-on-write is triggered by a fork. This invalidates some pages inside the prange. Endding up in mixed pages. [How] By classifying each page inside a prange, based on its type. Device or system memory, during dma mapping call. If page corresponds to VRAM domain, a flag is set to its dma_addr entry for each GPU. Then, at the GPU page table mapping. All group of contiguous pages within the same type are mapped with their proper pte flags. v2: Instead of using ttm_res to calculate vram pfns in the svm_range. It is now done by setting the vram real physical address into drm_addr array. This makes more flexible VRAM management, plus removes the need to have a BO reference in the svm_range. v3: Remove mapping member from svm_range Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 72 +--- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 2 +- 2 files changed, 46 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 2b4318646a75..7b50395ec377 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -119,11 +119,12 @@ static void svm_range_remove_notifier(struct svm_range *prange) } static int -svm_range_dma_map_dev(struct device *dev, dma_addr_t **dma_addr, +svm_range_dma_map_dev(struct amdgpu_device *adev, dma_addr_t **dma_addr, unsigned long *hmm_pfns, uint64_t npages) { enum dma_data_direction dir = DMA_BIDIRECTIONAL; dma_addr_t *addr = *dma_addr; + struct device *dev = adev->dev; struct page *page; int i, r; @@ -141,6 +142,14 @@ svm_range_dma_map_dev(struct device *dev, dma_addr_t **dma_addr, dma_unmap_page(dev, addr[i], PAGE_SIZE, dir); page = hmm_pfn_to_page(hmm_pfns[i]); + if (is_zone_device_page(page)) { + addr[i] = (hmm_pfns[i] << PAGE_SHIFT) + + adev->vm_manager.vram_base_offset - + adev->kfd.dev->pgmap.range.start; + addr[i] |= SVM_RANGE_VRAM_DOMAIN; + pr_debug("vram address detected: 0x%llx\n", addr[i]); + continue; + } addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir); r = dma_mapping_error(dev, addr[i]); if (r) { @@ -175,7 +184,7 @@ svm_range_dma_map(struct svm_range *prange, unsigned long *bitmap, } adev = (struct amdgpu_device *)pdd->dev->kgd; - r = svm_range_dma_map_dev(adev->dev, &prange->dma_addr[gpuidx], + r = svm_range_dma_map_dev(adev, &prange->dma_addr[gpuidx], hmm_pfns, prange->npages); if (r) break; @@ -1003,21 +1012,22 @@ svm_range_split_by_granularity(struct kfd_process *p, struct mm_struct *mm, } static uint64_t -svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange) +svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange, + int domain) { struct amdgpu_device *bo_adev; uint32_t flags = prange->flags; uint32_t mapping_flags = 0; uint64_t pte_flags; - bool snoop = !prange->ttm_res; + bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN); bool coherent = flags & KFD_IOCTL_SVM_FLAG_COHERENT; - if (prange->svm_bo && prange->ttm_res) + if (domain == SVM_RANGE_VRAM_DOMAIN) bo_adev = amdgpu_ttm_adev(prange->svm_bo->bo->tbo.bdev); switch (adev->asic_type) { case CHIP_ARCTURUS: - if (prange->svm_bo && prange->ttm_res) { + if (domain == SVM_RANGE_VRAM_DOMAIN) { if (bo_adev == adev) { mapping_flags |= coherent ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW; @@ -1032,7 +1042,7 @@ svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange) } break; case CHIP_ALDEBARAN: - if (prange->svm_bo && prange->ttm_res) { + if (domain == SVM_RANGE_VRAM_DOMAIN) { if (bo_adev == adev) { mapping_flags |= coherent ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW; @@ -1061,14 +1071,14 @@ svm_range_get_pte_flags(struct amdgpu_device *adev, struct svm_range *prange) mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE; pte_flags = AMDGPU_PTE_VALID; - pte_flags |= prange->ttm_res ? 0 : AMDGPU_PTE_SYSTEM; + pte_flags |= (domain == SVM_RANGE_VRAM_DOMAIN)
[PATCH 03/10] drm/amdkfd: set owner ref to svm range prefault
From: Alex Sierra svm_range_prefault is called right before migrations to VRAM, to make sure pages are resident in system memory before the migration. With partial migrations, this reference is used by hmm range get pages to avoid migrating pages that are already in the same VRAM domain. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++-- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 ++- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 11f7f590c6ec..b298aa8dea4d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -512,7 +512,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, prange->start, prange->last, best_loc); /* FIXME: workaround for page locking bug with invalid pages */ - svm_range_prefault(prange, mm); + svm_range_prefault(prange, mm, SVM_ADEV_PGMAP_OWNER(adev)); start = prange->start << PAGE_SHIFT; end = (prange->last + 1) << PAGE_SHIFT; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index b939f353ac8c..54f47b09b14a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2646,7 +2646,8 @@ svm_range_best_prefetch_location(struct svm_range *prange) /* FIXME: This is a workaround for page locking bug when some pages are * invalid during migration to VRAM */ -void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm) +void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm, + void *owner) { struct hmm_range *hmm_range; int r; @@ -2657,7 +2658,7 @@ void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm) r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, prange->start << PAGE_SHIFT, prange->npages, &hmm_range, - false, true, NULL); + false, true, owner); if (!r) { amdgpu_hmm_range_get_pages_done(hmm_range); prange->validated_once = true; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 4297250f259d..08542fe39303 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -176,7 +176,8 @@ void schedule_deferred_list_work(struct svm_range_list *svms); void svm_range_dma_unmap(struct device *dev, dma_addr_t *dma_addr, unsigned long offset, unsigned long npages); void svm_range_free_dma_mappings(struct svm_range *prange); -void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm); +void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm, + void *owner); #else -- 2.31.1
[PATCH 04/10] drm/amdgpu: get owner ref in validate and map
From: Alex Sierra Get the proper owner reference for amdgpu_hmm_range_get_pages function. This is useful for partial migrations. To avoid migrating back to system memory, VRAM pages, that are accessible by all devices in the same memory domain. Ex. multiple devices in the same hive. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 54f47b09b14a..2b4318646a75 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1313,6 +1313,17 @@ static void svm_range_unreserve_bos(struct svm_validate_context *ctx) ttm_eu_backoff_reservation(&ctx->ticket, &ctx->validate_list); } +static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx) +{ + struct kfd_process_device *pdd; + struct amdgpu_device *adev; + + pdd = kfd_process_device_from_gpuidx(p, gpuidx); + adev = (struct amdgpu_device *)pdd->dev->kgd; + + return SVM_ADEV_PGMAP_OWNER(adev); +} + /* * Validation+GPU mapping with concurrent invalidation (MMU notifiers) * @@ -1343,6 +1354,9 @@ static int svm_range_validate_and_map(struct mm_struct *mm, { struct svm_validate_context ctx; struct hmm_range *hmm_range; + struct kfd_process *p; + void *owner; + int32_t idx; int r = 0; ctx.process = container_of(prange->svms, struct kfd_process, svms); @@ -1389,10 +1403,19 @@ static int svm_range_validate_and_map(struct mm_struct *mm, svm_range_reserve_bos(&ctx); if (!prange->actual_loc) { + p = container_of(prange->svms, struct kfd_process, svms); + owner = kfd_svm_page_owner(p, find_first_bit(ctx.bitmap, + MAX_GPU_INSTANCE)); + for_each_set_bit(idx, ctx.bitmap, MAX_GPU_INSTANCE) { + if (kfd_svm_page_owner(p, idx) != owner) { + owner = NULL; + break; + } + } r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, prange->start << PAGE_SHIFT, prange->npages, &hmm_range, - false, true, NULL); + false, true, owner); if (r) { pr_debug("failed %d to get svm range pages\n", r); goto unreserve_out; -- 2.31.1
[PATCH 02/10] drm/amdkfd: add owner ref param to get hmm pages
From: Alex Sierra The parameter is used in the dev_private_owner to decide if device pages in the range require to be migrated back to system memory, based if they are or not in the same memory domain. In this case, this reference could come from the same memory domain with devices connected to the same hive. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 4 ++-- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index 2741c28ff1b5..378c238c2099 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -160,7 +160,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, struct mm_struct *mm, struct page **pages, uint64_t start, uint64_t npages, struct hmm_range **phmm_range, bool readonly, - bool mmap_locked) + bool mmap_locked, void *owner) { struct hmm_range *hmm_range; unsigned long timeout; @@ -185,6 +185,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->hmm_pfns = pfns; hmm_range->start = start; hmm_range->end = start + npages * PAGE_SIZE; + hmm_range->dev_private_owner = owner; /* Assuming 512MB takes maxmium 1 second to fault page address */ timeout = max(npages >> 17, 1ULL) * HMM_RANGE_DEFAULT_TIMEOUT; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h index 7f7d37a457c3..14a3c1864085 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h @@ -34,7 +34,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, struct mm_struct *mm, struct page **pages, uint64_t start, uint64_t npages, struct hmm_range **phmm_range, bool readonly, - bool mmap_locked); + bool mmap_locked, void *owner); int amdgpu_hmm_range_get_pages_done(struct hmm_range *hmm_range); #if defined(CONFIG_HMM_MIRROR) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 7e7d8330d64b..c13f7fbfc070 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -709,7 +709,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) readonly = amdgpu_ttm_tt_is_readonly(ttm); r = amdgpu_hmm_range_get_pages(&bo->notifier, mm, pages, start, ttm->num_pages, >t->range, readonly, - false); + false, NULL); out_putmm: mmput(mm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index b665e9ff77e3..b939f353ac8c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1392,7 +1392,7 @@ static int svm_range_validate_and_map(struct mm_struct *mm, r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, prange->start << PAGE_SHIFT, prange->npages, &hmm_range, - false, true); + false, true, NULL); if (r) { pr_debug("failed %d to get svm range pages\n", r); goto unreserve_out; @@ -2657,7 +2657,7 @@ void svm_range_prefault(struct svm_range *prange, struct mm_struct *mm) r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL, prange->start << PAGE_SHIFT, prange->npages, &hmm_range, - false, true); + false, true, NULL); if (!r) { amdgpu_hmm_range_get_pages_done(hmm_range); prange->validated_once = true; -- 2.31.1
[PATCH 01/10] drm/amdkfd: device pgmap owner at the svm migrate init
From: Alex Sierra pgmap owner member at the svm migrate init could be referenced to either adev or hive, depending on device topology. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 6 +++--- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 +++ 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index fd8f544f0de2..11f7f590c6ec 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -426,7 +426,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange, migrate.start = start; migrate.end = end; migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; - migrate.pgmap_owner = adev; + migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t); size *= npages; @@ -641,7 +641,7 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange, migrate.start = start; migrate.end = end; migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; - migrate.pgmap_owner = adev; + migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t); size *= npages; @@ -907,7 +907,7 @@ int svm_migrate_init(struct amdgpu_device *adev) pgmap->range.start = res->start; pgmap->range.end = res->end; pgmap->ops = &svm_migrate_pgmap_ops; - pgmap->owner = adev; + pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev); pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; r = devm_memremap_pages(adev->dev, pgmap); if (IS_ERR(r)) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 573f984b81fe..4297250f259d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -35,6 +35,9 @@ #include "amdgpu.h" #include "kfd_priv.h" +#define SVM_ADEV_PGMAP_OWNER(adev)\ + ((adev)->hive ? (void *)(adev)->hive : (void *)(adev)) + struct svm_range_bo { struct amdgpu_bo*bo; struct kref kref; -- 2.31.1
[PATCH 4/4] drm/panfrost: Handle PANFROST_JD_REQ_PERMON
From: Alyssa Rosenzweig If a job requires cycle counters or timestamps, we must enable cycle counting just before issuing the job, and disable as soon as the job completes. Since this extends the UABI, we bump the driver minor version and date. That lets userspace detect cycle counter support, and only advertise features like ARB_shader_clock on kernels with this commit. Signed-off-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_drv.c | 10 +++--- drivers/gpu/drm/panfrost/panfrost_job.c | 6 ++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ca07098a6..0f11d2df4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -20,6 +20,10 @@ #include "panfrost_gpu.h" #include "panfrost_perfcnt.h" +#define JOB_REQUIREMENTS \ + (PANFROST_JD_REQ_FS | \ +PANFROST_JD_REQ_PERMON) + static bool unstable_ioctls; module_param_unsafe(unstable_ioctls, bool, 0600); @@ -247,7 +251,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, if (!args->jc) return -EINVAL; - if (args->requirements && args->requirements != PANFROST_JD_REQ_FS) + if (args->requirements & ~JOB_REQUIREMENTS) return -EINVAL; if (args->out_sync > 0) { @@ -557,9 +561,9 @@ static const struct drm_driver panfrost_drm_driver = { .fops = &panfrost_drm_driver_fops, .name = "panfrost", .desc = "panfrost DRM", - .date = "20180908", + .date = "20210527", .major = 1, - .minor = 1, + .minor = 2, .gem_create_object = panfrost_gem_create_object, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 6003cfeb1..b78147e3d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -165,6 +165,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) return; } + if (job->requirements & PANFROST_JD_REQ_PERMON) + panfrost_acquire_permon(job->pfdev); + cfg = panfrost_mmu_as_get(pfdev, &job->file_priv->mmu); job_write(pfdev, JS_HEAD_NEXT_LO(js), jc_head & 0x); @@ -296,6 +299,9 @@ static void panfrost_job_cleanup(struct kref *ref) kvfree(job->bos); } + if (job->requirements & PANFROST_JD_REQ_PERMON) + panfrost_release_permon(job->pfdev); + kfree(job); } -- 2.30.2
[PATCH 3/4] drm/panfrost: Add permon acquire/release helpers
From: Alyssa Rosenzweig Wrap the underlying CYCLE_COUNT_START/STOP commands in a safe interface that ensures the commands are only issued where required by guarding behind an atomic counter. In particular, we need to be careful about races between multiple in-flight jobs, where only some require cycle counts. Signed-off-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_device.h | 3 +++ drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 drivers/gpu/drm/panfrost/panfrost_gpu.h| 3 +++ 3 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 597cf1459..8a89aa274 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -117,6 +117,9 @@ struct panfrost_device { struct shrinker shrinker; struct panfrost_devfreq pfdevfreq; + + /* Number of active jobs requiring performance monitoring */ + atomic_t permon_pending; }; struct panfrost_mmu { diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index 2aae636f1..acacceb15 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -399,3 +399,23 @@ u32 panfrost_gpu_get_latest_flush_id(struct panfrost_device *pfdev) return 0; } + +void panfrost_acquire_permon(struct panfrost_device *pfdev) +{ + /* If another in-flight job enabled permon, we don't have to */ + if (atomic_inc_return(&pfdev->permon_pending) > 1) + return; + + /* Otherwise, we're the first user */ + gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START); +} + +void panfrost_release_permon(struct panfrost_device *pfdev) +{ + /* If another in-flight job needs permon, keep it active */ + if (atomic_dec_return(&pfdev->permon_pending) > 0) + return; + + /* Otherwise, we're the last user */ + gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP); +} diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.h b/drivers/gpu/drm/panfrost/panfrost_gpu.h index 468c51e7e..01a91af09 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.h +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.h @@ -18,4 +18,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev); void panfrost_gpu_amlogic_quirk(struct panfrost_device *pfdev); +void panfrost_acquire_permon(struct panfrost_device *pfdev); +void panfrost_release_permon(struct panfrost_device *pfdev); + #endif -- 2.30.2
[PATCH 2/4] drm/panfrost: Add CYCLE_COUNT_START/STOP commands
From: Alyssa Rosenzweig Add additional values of GPU_COMMAND required to enable and disable the cycle (and timestamp) counters. Values from mali_kbase. Signed-off-by: Alyssa Rosenzweig --- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index eddaa62ad..8ac60de6f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 -- 2.30.2
[PATCH 1/4] drm/panfrost: Add cycle counter job requirement
From: Alyssa Rosenzweig Extend the Panfrost UABI with a new job requirement for cycle counters (and GPU timestamps, by extension). This requirement is used in userspace to implement ARB_shader_clock, an OpenGL extension reporting the GPU cycle count within a shader. The same mechanism will be required to implement timestamp queries as a "write value - timestamp" job. We cannot enable cycle counters unconditionally, as enabling them increases GPU power consumption. They should be left off unless actually required by the application for profiling purposes. Signed-off-by: Alyssa Rosenzweig --- include/uapi/drm/panfrost_drm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h index ec19db1ee..27e6cb941 100644 --- a/include/uapi/drm/panfrost_drm.h +++ b/include/uapi/drm/panfrost_drm.h @@ -39,7 +39,8 @@ extern "C" { #define DRM_IOCTL_PANFROST_PERFCNT_ENABLE DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_PERFCNT_ENABLE, struct drm_panfrost_perfcnt_enable) #define DRM_IOCTL_PANFROST_PERFCNT_DUMP DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_PERFCNT_DUMP, struct drm_panfrost_perfcnt_dump) -#define PANFROST_JD_REQ_FS (1 << 0) +#define PANFROST_JD_REQ_FS (1 << 0) +#define PANFROST_JD_REQ_PERMON (1 << 1) /** * struct drm_panfrost_submit - ioctl argument for submitting commands to the 3D * engine. -- 2.30.2
[PATCH 0/4] drm/panfrost: Plumb cycle counters to userspace
From: Alyssa Rosenzweig Mali has hardware cycle counters (and GPU timestamps) available for profiling. These are exposed in various ways: - Kernel: As CYCLE_COUNT and TIMESTAMP registers - Job chain: As WRITE_VALUE descriptors - Shader (Midgard): As LD_SPECIAL selectors - Shader (Bifrost): As the LD_GCLK.u64 instruction These form building blocks for profiling features, for example the ARB_shader_clock extension which accesses the counters from an application's shader. The counters consume power, so it is recommended to disable the counters when not in use. To do so, we follow the strategy from mali_kbase: add a counter requirement to the job, start the counters only when required, and stop them as quickly as possible. The new UABI will be used in Mesa. An implementation of ARB_shader_clock using this UABI is available as a pending upstream merge request [1]. The implementation passes the relevant piglit test, validating both the kernel and mesa. The main outstanding questing is the proper name. Performance monitoring ("PERMON") is the name used by kbase, but it's jargon-y and risks confusion with performance counters, an orthogonal mechanism. Cycle count is more descriptive and matches the actual hardware name, but obscures that the same mechanism is required for GPU timestamps. This bit of bikeshedding aside, I'm pleased with the patches. [1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/11051 Alyssa Rosenzweig (4): drm/panfrost: Add cycle counter job requirement drm/panfrost: Add CYCLE_COUNT_START/STOP commands drm/panfrost: Add permon acquire/release helpers drm/panfrost: Handle PANFROST_JD_REQ_PERMON drivers/gpu/drm/panfrost/panfrost_device.h | 3 +++ drivers/gpu/drm/panfrost/panfrost_drv.c| 10 +++--- drivers/gpu/drm/panfrost/panfrost_gpu.c| 20 drivers/gpu/drm/panfrost/panfrost_gpu.h| 3 +++ drivers/gpu/drm/panfrost/panfrost_job.c| 6 ++ drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ include/uapi/drm/panfrost_drm.h| 3 ++- 7 files changed, 43 insertions(+), 4 deletions(-) -- 2.30.2
Re: [PATCH v5 3/3] drm_dp_cec: add MST support
On Tue, 2021-05-25 at 10:59 +1000, Sam McNally wrote: > With DP v2.0 errata E5, CEC tunneling can be supported through an MST > topology. > > When tunneling CEC through an MST port, CEC IRQs are delivered via a > sink event notify message; when a sink event notify message is received, > trigger CEC IRQ handling - ESI1 is not used for remote CEC IRQs so its > value is not checked. > > Register and unregister for all MST connectors, ensuring their > drm_dp_aux_cec struct won't be accessed uninitialized. > > Reviewed-by: Hans Verkuil > Signed-off-by: Sam McNally > --- > > (no changes since v4) > > Changes in v4: > - Removed use of work queues > - Updated checks of aux.transfer to accept aux.is_remote > > Changes in v3: > - Fixed whitespace in drm_dp_cec_mst_irq_work() > - Moved drm_dp_cec_mst_set_edid_work() with the other set_edid functions > > Changes in v2: > - Used aux->is_remote instead of aux->cec.is_mst, removing the need for > the previous patch in the series > - Added a defensive check for null edid in the deferred set_edid work, > in case the edid is no longer valid at that point > > drivers/gpu/drm/drm_dp_cec.c | 20 > drivers/gpu/drm/drm_dp_mst_topology.c | 24 > 2 files changed, 40 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c > index 3ab2609f9ec7..1abd3f4654dc 100644 > --- a/drivers/gpu/drm/drm_dp_cec.c > +++ b/drivers/gpu/drm/drm_dp_cec.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > /* > * Unfortunately it turns out that we have a chicken-and-egg situation > @@ -245,13 +246,22 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux) > int ret; > > /* No transfer function was set, so not a DP connector */ > - if (!aux->transfer) > + if (!aux->transfer && !aux->is_remote) > return; > > mutex_lock(&aux->cec.lock); > if (!aux->cec.adap) > goto unlock; > > + if (aux->is_remote) { > + /* > + * For remote connectors, CEC IRQ is triggered by an > explicit > + * message so ESI1 is not involved. > + */ > + drm_dp_cec_handle_irq(aux); > + goto unlock; > + } > + > ret = drm_dp_dpcd_readb(aux, DP_DEVICE_SERVICE_IRQ_VECTOR_ESI1, > &cec_irq); > if (ret < 0 || !(cec_irq & DP_CEC_IRQ)) > @@ -307,7 +317,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const > struct edid *edid) > u8 cap; > > /* No transfer function was set, so not a DP connector */ > - if (!aux->transfer) > + if (!aux->transfer && !aux->is_remote) > return; > > #ifndef CONFIG_MEDIA_CEC_RC > @@ -375,6 +385,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const > struct edid *edid) > unlock: > mutex_unlock(&aux->cec.lock); > } > + > EXPORT_SYMBOL(drm_dp_cec_set_edid); probably want to get rid of this whitespace With that fixed, this is: Reviewed-by: Lyude Paul > > /* > @@ -383,7 +394,7 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid); > void drm_dp_cec_unset_edid(struct drm_dp_aux *aux) > { > /* No transfer function was set, so not a DP connector */ > - if (!aux->transfer) > + if (!aux->transfer && !aux->is_remote) > return; > > cancel_delayed_work_sync(&aux->cec.unregister_work); > @@ -393,6 +404,7 @@ void drm_dp_cec_unset_edid(struct drm_dp_aux *aux) > goto unlock; > > cec_phys_addr_invalidate(aux->cec.adap); > + > /* > * We're done if we want to keep the CEC device > * (drm_dp_cec_unregister_delay is >= NEVER_UNREG_DELAY) or if the > @@ -428,7 +440,7 @@ void drm_dp_cec_register_connector(struct drm_dp_aux > *aux, > struct drm_connector *connector) > { > WARN_ON(aux->cec.adap); > - if (WARN_ON(!aux->transfer)) > + if (WARN_ON(!aux->transfer && !aux->is_remote)) > return; > aux->cec.connector = connector; > INIT_DELAYED_WORK(&aux->cec.unregister_work, > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 29aad3b6b31a..5612caf9fb49 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -2359,6 +2359,8 @@ static void build_mst_prop_path(const struct > drm_dp_mst_branch *mstb, > int drm_dp_mst_connector_late_register(struct drm_connector *connector, > struct drm_dp_mst_port *port) > { > + drm_dp_cec_register_connector(&port->aux, connector); > + > drm_dbg_kms(port->mgr->dev, "registering %s remote bus for %s\n", > port->aux.name, connector->kdev->kobj.name); > > @@ -2382,6 +2384,8 @@ void drm_dp_mst_connector_early_unregister(struct > drm_connector
Re: [PATCH v5 1/3] drm/dp_mst: Add self-tests for up requests
On Tue, 2021-05-25 at 10:59 +1000, Sam McNally wrote: > Up requests are decoded by drm_dp_sideband_parse_req(), which operates > on a drm_dp_sideband_msg_rx, unlike down requests. Expand the existing > self-test helper sideband_msg_req_encode_decode() to copy the message > contents and length from a drm_dp_sideband_msg_tx to > drm_dp_sideband_msg_rx and use the parse function under test in place of > decode. Add an additional helper for testing clearly-invalid up > messages, verifying that parse rejects them. > > Add support for currently-supported up requests to > drm_dp_dump_sideband_msg_req_body(); add support to > drm_dp_encode_sideband_req() to allow encoding for the self-tests. > > Add self-tests for CONNECTION_STATUS_NOTIFY and RESOURCE_STATUS_NOTIFY. > > Signed-off-by: Sam McNally > --- > > Changes in v5: > - Set mock device name to more clearly attribute error/debug logging to > the self-test, in particular for cases where failures are expected > > Changes in v4: > - New in v4 > > drivers/gpu/drm/drm_dp_mst_topology.c | 54 ++- > .../gpu/drm/drm_dp_mst_topology_internal.h | 4 + > .../drm/selftests/test-drm_dp_mst_helper.c | 149 -- > 3 files changed, 192 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 54604633e65c..573f39a3dc16 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -442,6 +442,37 @@ drm_dp_encode_sideband_req(const struct > drm_dp_sideband_msg_req_body *req, > idx++; > } > break; > + case DP_CONNECTION_STATUS_NOTIFY: { > + const struct drm_dp_connection_status_notify *msg; > + > + msg = &req->u.conn_stat; > + buf[idx] = (msg->port_number & 0xf) << 4; > + idx++; > + memcpy(&raw->msg[idx], msg->guid, 16); > + idx += 16; > + raw->msg[idx] = 0; > + raw->msg[idx] |= msg->legacy_device_plug_status ? BIT(6) : > 0; > + raw->msg[idx] |= msg->displayport_device_plug_status ? > BIT(5) : 0; > + raw->msg[idx] |= msg->message_capability_status ? BIT(4) : > 0; > + raw->msg[idx] |= msg->input_port ? BIT(3) : 0; > + raw->msg[idx] |= FIELD_PREP(GENMASK(2, 0), msg- > >peer_device_type); > + idx++; > + break; > + } > + case DP_RESOURCE_STATUS_NOTIFY: { > + const struct drm_dp_resource_status_notify *msg; > + > + msg = &req->u.resource_stat; > + buf[idx] = (msg->port_number & 0xf) << 4; > + idx++; > + memcpy(&raw->msg[idx], msg->guid, 16); > + idx += 16; > + buf[idx] = (msg->available_pbn & 0xff00) >> 8; > + idx++; > + buf[idx] = (msg->available_pbn & 0xff); > + idx++; > + break; > + } > } > raw->cur_len = idx; > } > @@ -672,6 +703,22 @@ drm_dp_dump_sideband_msg_req_body(const struct > drm_dp_sideband_msg_req_body *req > req->u.enc_status.stream_behavior, > req->u.enc_status.valid_stream_behavior); > break; > + case DP_CONNECTION_STATUS_NOTIFY: > + P("port=%d guid=%*ph legacy=%d displayport=%d messaging=%d > input=%d peer_type=%d", > + req->u.conn_stat.port_number, > + (int)ARRAY_SIZE(req->u.conn_stat.guid), req- > >u.conn_stat.guid, > + req->u.conn_stat.legacy_device_plug_status, > + req->u.conn_stat.displayport_device_plug_status, > + req->u.conn_stat.message_capability_status, > + req->u.conn_stat.input_port, > + req->u.conn_stat.peer_device_type); > + break; > + case DP_RESOURCE_STATUS_NOTIFY: > + P("port=%d guid=%*ph pbn=%d", > + req->u.resource_stat.port_number, > + (int)ARRAY_SIZE(req->u.resource_stat.guid), req- > >u.resource_stat.guid, > + req->u.resource_stat.available_pbn); > + break; > default: > P("???\n"); > break; > @@ -1116,9 +1163,9 @@ static bool > drm_dp_sideband_parse_resource_status_notify(const struct drm_dp_mst > return false; > } > > -static bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr > *mgr, > - struct drm_dp_sideband_msg_rx *raw, > - struct drm_dp_sideband_msg_req_body > *msg) > +bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr *mgr, > + struct drm_dp_sideband_msg_rx *raw, > + struct drm_dp_sideband_msg_req_body *msg) > { > memset(msg, 0, sizeof(*msg)); >
Re: [RFC PATCH 24/97] drm/i915/guc: Add flag for mark broken CTB
On Thu, May 06, 2021 at 12:13:38PM -0700, Matthew Brost wrote: > From: Michal Wajdeczko > > Once CTB descriptor is found in error state, either set by GuC > or us, there is no need continue checking descriptor any more, > we can rely on our internal flag. > > Signed-off-by: Michal Wajdeczko > Signed-off-by: Matthew Brost Reviewed-by: Matthew Brost > Cc: Piotr Piórkowski > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 13 +++-- > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 2 ++ > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index 1afdeac683b5..178f73ab2c96 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -123,6 +123,7 @@ static void guc_ct_buffer_desc_init(struct > guc_ct_buffer_desc *desc, > > static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 > cmds_addr) > { > + ctb->broken = false; > guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size); > } > > @@ -365,9 +366,12 @@ static int ct_write(struct intel_guc_ct *ct, > u32 *cmds = ctb->cmds; > unsigned int i; > > - if (unlikely(desc->is_in_error)) > + if (unlikely(ctb->broken)) > return -EPIPE; > > + if (unlikely(desc->is_in_error)) > + goto corrupted; > + > if (unlikely(!IS_ALIGNED(head | tail, 4) || >(tail | head) >= size)) > goto corrupted; > @@ -423,6 +427,7 @@ static int ct_write(struct intel_guc_ct *ct, > CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n", >desc->addr, desc->head, desc->tail, desc->size); > desc->is_in_error = 1; > + ctb->broken = true; > return -EPIPE; > } > > @@ -608,9 +613,12 @@ static int ct_read(struct intel_guc_ct *ct, struct > ct_incoming_msg **msg) > unsigned int i; > u32 header; > > - if (unlikely(desc->is_in_error)) > + if (unlikely(ctb->broken)) > return -EPIPE; > > + if (unlikely(desc->is_in_error)) > + goto corrupted; > + > if (unlikely(!IS_ALIGNED(head | tail, 4) || >(tail | head) >= size)) > goto corrupted; > @@ -674,6 +682,7 @@ static int ct_read(struct intel_guc_ct *ct, struct > ct_incoming_msg **msg) > CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n", >desc->addr, desc->head, desc->tail, desc->size); > desc->is_in_error = 1; > + ctb->broken = true; > return -EPIPE; > } > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h > b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h > index cb222f202301..7d3cd375d6a7 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h > @@ -32,12 +32,14 @@ struct intel_guc; > * @desc: pointer to the buffer descriptor > * @cmds: pointer to the commands buffer > * @size: size of the commands buffer > + * @broken: flag to indicate if descriptor data is broken > */ > struct intel_guc_ct_buffer { > spinlock_t lock; > struct guc_ct_buffer_desc *desc; > u32 *cmds; > u32 size; > + bool broken; > }; > > > -- > 2.28.0 >
[PATCH] Revert "i915: use io_mapping_map_user"
This reverts commit b739f125e4ebd73d10ed30a856574e13649119ed. We are unfortunately seeing more issues like we did in 293837b9ac8d ("Revert "i915: fix remap_io_sg to verify the pgprot""), except this is now for the vm_fault_gtt path, where we are now hitting the same BUG_ON(!pte_none(*pte)): [10887.466150] kernel BUG at mm/memory.c:2183! [10887.466162] invalid opcode: [#1] PREEMPT SMP PTI [10887.466168] CPU: 0 PID: 7775 Comm: ffmpeg Tainted: G U 5.13.0-rc3-CI-Nightly #1 [10887.466174] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.40 07/14/2017 [10887.466177] RIP: 0010:remap_pfn_range_notrack+0x30f/0x440 [10887.466188] Code: e8 96 d7 e0 ff 84 c0 0f 84 27 01 00 00 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 e0 48 c1 e0 0c 4d 85 ed 75 96 48 21 d0 31 f6 eb a9 <0f> 0b 48 39 37 0f 85 0e 01 00 00 48 8b 0c 24 48 39 4f 08 0f 85 00 [10887.466193] RSP: 0018:c90006e33c50 EFLAGS: 00010286 [10887.466198] RAX: 802f RBX: 7f5e0180 RCX: 0028 [10887.466201] RDX: 0001 RSI: ea00 RDI: [10887.466204] RBP: ea33fea8 R08: 802f R09: 8881072256e0 [10887.466207] R10: c9000b84fff8 R11: 17dab000 R12: 00089f9f [10887.466210] R13: 802f R14: 7f5e017e4000 R15: 88800cffaf20 [10887.466213] FS: 7f5e04849640() GS:88827800() knlGS: [10887.466216] CS: 0010 DS: ES: CR0: 80050033 [10887.466220] CR2: 7fd9b191a2ac CR3: 0001829ac000 CR4: 003506f0 [10887.466223] Call Trace: [10887.466233] vm_fault_gtt+0x1ca/0x5d0 [i915] [10887.466381] ? ktime_get+0x38/0x90 [10887.466389] __do_fault+0x37/0x90 [10887.466395] __handle_mm_fault+0xc46/0x1200 [10887.466402] handle_mm_fault+0xce/0x2a0 [10887.466407] do_user_addr_fault+0x1c5/0x660 Reverting this commit is reported to fix the issue. Reported-by: Eero Tamminen References: https://gitlab.freedesktop.org/drm/intel/-/issues/3519 Fixes: b739f125e4eb ("i915: use io_mapping_map_user") Cc: Christoph Hellwig Cc: Daniel Vetter Signed-off-by: Matthew Auld --- drivers/gpu/drm/i915/Kconfig | 1 - drivers/gpu/drm/i915/gem/i915_gem_mman.c | 9 ++--- drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_mm.c | 44 4 files changed, 52 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 93f4d059fc89..1e1cb245fca7 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -20,7 +20,6 @@ config DRM_I915 select INPUT if ACPI select ACPI_VIDEO if ACPI select ACPI_BUTTON if ACPI - select IO_MAPPING select SYNC_FILE select IOSF_MBI select CRC32 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index f6fe5cb01438..8598a1c78a4c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -367,10 +367,11 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf) goto err_unpin; /* Finally, remap it using the new GTT offset */ - ret = io_mapping_map_user(&ggtt->iomap, area, area->vm_start + - (vma->ggtt_view.partial.offset << PAGE_SHIFT), - (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT, - min_t(u64, vma->size, area->vm_end - area->vm_start)); + ret = remap_io_mapping(area, + area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT), + (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT, + min_t(u64, vma->size, area->vm_end - area->vm_start), + &ggtt->iomap); if (ret) goto err_fence; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0f6d27da69ac..e926f20c5b82 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1941,6 +1941,9 @@ int i915_reg_read_ioctl(struct drm_device *dev, void *data, struct drm_file *file); /* i915_mm.c */ +int remap_io_mapping(struct vm_area_struct *vma, +unsigned long addr, unsigned long pfn, unsigned long size, +struct io_mapping *iomap); int remap_io_sg(struct vm_area_struct *vma, unsigned long addr, unsigned long size, struct scatterlist *sgl, resource_size_t iobase); diff --git a/drivers/gpu/drm/i915/i915_mm.c b/drivers/gpu/drm/i915/i915_mm.c index 9a777b0ff59b..666808cb3a32 100644 --- a/drivers/gpu/drm/i915/i915_mm.c +++ b/drivers/gpu/drm/i915/i915_mm.c @@ -37,6 +37,17 @@ struct remap_pfn { resource_size_t iobase; }; +static int remap_pfn(pte_t *pte, unsigned long addr, void *data) +{ + struct remap_pfn *r = data;
Re: [PATCH 20/29] drm/i915/gem: Make an alignment check more sensible
On Thu, May 27, 2021 at 11:26:41AM -0500, Jason Ekstrand wrote: > What we really want to check is that size of the engines array, i.e. > args->size - sizeof(*user) is divisible by the element size, i.e. > sizeof(*user->engines) because that's what's required for computing the > array length right below the check. However, we're currently not doing > this and instead doing a compile-time check that sizeof(*user) is > divisible by sizeof(*user->engines) and avoiding the subtraction. As > far as I can tell, the only reason for the more confusing pair of checks > is to avoid a single subtraction of a constant. > > The other thing the BUILD_BUG_ON might be trying to implicitly check is > that offsetof(user->engines) == sizeof(*user) and we don't have any > weird padding throwing us off. However, that's not the check it's doing > and it's not even a reliable way to do that check. > > Signed-off-by: Jason Ekstrand Yeah a non-dense compiler should be able to figure this out, plus set_engines isn't a hotpath. Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 12a148ba421b6..cf7c281977a3e 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -1758,9 +1758,8 @@ set_engines(struct i915_gem_context *ctx, > goto replace; > } > > - BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->engines))); > if (args->size < sizeof(*user) || > - !IS_ALIGNED(args->size, sizeof(*user->engines))) { > + !IS_ALIGNED(args->size - sizeof(*user), sizeof(*user->engines))) { > drm_dbg(&i915->drm, "Invalid size for engine array: %d\n", > args->size); > return -EINVAL; > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 19/29] drm/i915: Add an i915_gem_vm_lookup helper
On Thu, May 27, 2021 at 11:26:40AM -0500, Jason Ekstrand wrote: > This is the VM equivalent of i915_gem_context_lookup. It's only used > once in this patch but future patches will need to duplicate this lookup > code so it's better to have it in a helper. > > Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +- > drivers/gpu/drm/i915/i915_drv.h | 14 ++ > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index d247fb223aac7..12a148ba421b6 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -1346,11 +1346,7 @@ static int set_ppgtt(struct drm_i915_file_private > *file_priv, > if (upper_32_bits(args->value)) > return -ENOENT; > > - rcu_read_lock(); > - vm = xa_load(&file_priv->vm_xa, args->value); > - if (vm && !kref_get_unless_zero(&vm->ref)) > - vm = NULL; > - rcu_read_unlock(); > + vm = i915_gem_vm_lookup(file_priv, args->value); > if (!vm) > return -ENOENT; > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 48316d273af66..fee2342219da1 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1871,6 +1871,20 @@ i915_gem_context_lookup(struct drm_i915_file_private > *file_priv, u32 id) > return ctx; > } > > +static inline struct i915_address_space * > +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id) > +{ > + struct i915_address_space *vm; > + > + rcu_read_lock(); > + vm = xa_load(&file_priv->vm_xa, id); > + if (vm && !kref_get_unless_zero(&vm->ref)) > + vm = NULL; > + rcu_read_unlock(); > + > + return vm; > +} > + > /* i915_gem_evict.c */ > int __must_check i915_gem_evict_something(struct i915_address_space *vm, > u64 min_size, u64 alignment, > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 06/18] drm/i915/guc: Drop guc->interrupts.enabled
On Thu, May 27, 2021 at 10:17:20AM -0700, John Harrison wrote: > On 5/25/2021 23:42, Matthew Brost wrote: > > Drop the variable guc->interrupts.enabled as this variable is just > > leading to bugs creeping into the code. > > > > e.g. A full GPU reset disables the GuC interrupts but forgot to clear > > guc->interrupts.enabled, guc->interrupts.enabled being true suppresses > > interrupts from getting re-enabled and now we are broken. > > > > It is harmless to enable interrupt while already enabled so let's just > > delete this variable to avoid bugs like this going forward. > Is it worth leaving the enabled flag in place but only using it to trip a > WARN to catch such cases in a less catastrophic manner? Or are there valid > reasons for calling enable when already enabled? > I don't think so as mentioned above a reset disables these interrupts and if we didn't clear this field the WARN_ON would be triggered making CI unhappy. Yes, the bug would less catastrophic but we'd still have to waste time and energy chasing it. Matt > Either way, it seems like a plausible change and CI is happy with it, so: > Reviewed-by: John Harrison > > John. > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc.c | 27 +- > > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 - > > 2 files changed, 9 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > index ab2c8fe8cdfa..18da9ed15728 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > @@ -96,12 +96,9 @@ static void gen9_enable_guc_interrupts(struct intel_guc > > *guc) > > assert_rpm_wakelock_held(>->i915->runtime_pm); > > spin_lock_irq(>->irq_lock); > > - if (!guc->interrupts.enabled) { > > - WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) & > > -gt->pm_guc_events); > > - guc->interrupts.enabled = true; > > - gen6_gt_pm_enable_irq(gt, gt->pm_guc_events); > > - } > > + WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) & > > +gt->pm_guc_events); > > + gen6_gt_pm_enable_irq(gt, gt->pm_guc_events); > > spin_unlock_irq(>->irq_lock); > > } > > @@ -112,7 +109,6 @@ static void gen9_disable_guc_interrupts(struct > > intel_guc *guc) > > assert_rpm_wakelock_held(>->i915->runtime_pm); > > spin_lock_irq(>->irq_lock); > > - guc->interrupts.enabled = false; > > gen6_gt_pm_disable_irq(gt, gt->pm_guc_events); > > @@ -134,18 +130,14 @@ static void gen11_reset_guc_interrupts(struct > > intel_guc *guc) > > static void gen11_enable_guc_interrupts(struct intel_guc *guc) > > { > > struct intel_gt *gt = guc_to_gt(guc); > > + u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST); > > spin_lock_irq(>->irq_lock); > > - if (!guc->interrupts.enabled) { > > - u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST); > > - > > - WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC)); > > - intel_uncore_write(gt->uncore, > > - GEN11_GUC_SG_INTR_ENABLE, events); > > - intel_uncore_write(gt->uncore, > > - GEN11_GUC_SG_INTR_MASK, ~events); > > - guc->interrupts.enabled = true; > > - } > > + WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC)); > > + intel_uncore_write(gt->uncore, > > + GEN11_GUC_SG_INTR_ENABLE, events); > > + intel_uncore_write(gt->uncore, > > + GEN11_GUC_SG_INTR_MASK, ~events); > > spin_unlock_irq(>->irq_lock); > > } > > @@ -154,7 +146,6 @@ static void gen11_disable_guc_interrupts(struct > > intel_guc *guc) > > struct intel_gt *gt = guc_to_gt(guc); > > spin_lock_irq(>->irq_lock); > > - guc->interrupts.enabled = false; > > intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_MASK, ~0); > > intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_ENABLE, 0); > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > index c20f3839de12..4abc59f6f3cd 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > @@ -33,7 +33,6 @@ struct intel_guc { > > unsigned int msg_enabled_mask; > > struct { > > - bool enabled; > > void (*reset)(struct intel_guc *guc); > > void (*enable)(struct intel_guc *guc); > > void (*disable)(struct intel_guc *guc); >
[Bug 212957] [radeon] kernel NULL pointer dereference during system boot
https://bugzilla.kernel.org/show_bug.cgi?id=212957 Dennis Foster (m...@dennisfoster.us) changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |PATCH_ALREADY_AVAILABLE --- Comment #5 from Dennis Foster (m...@dennisfoster.us) --- The issue is now resolved in kernel version 5.12.7 Link to the patch commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.12.y&id=ec1bd01b632ad748dce8a0eeb4c167bead71315f -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [Intel-gfx] [PATCH 06/18] drm/i915/guc: Drop guc->interrupts.enabled
On 5/25/2021 23:42, Matthew Brost wrote: Drop the variable guc->interrupts.enabled as this variable is just leading to bugs creeping into the code. e.g. A full GPU reset disables the GuC interrupts but forgot to clear guc->interrupts.enabled, guc->interrupts.enabled being true suppresses interrupts from getting re-enabled and now we are broken. It is harmless to enable interrupt while already enabled so let's just delete this variable to avoid bugs like this going forward. Is it worth leaving the enabled flag in place but only using it to trip a WARN to catch such cases in a less catastrophic manner? Or are there valid reasons for calling enable when already enabled? Either way, it seems like a plausible change and CI is happy with it, so: Reviewed-by: John Harrison John. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 27 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 - 2 files changed, 9 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index ab2c8fe8cdfa..18da9ed15728 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -96,12 +96,9 @@ static void gen9_enable_guc_interrupts(struct intel_guc *guc) assert_rpm_wakelock_held(>->i915->runtime_pm); spin_lock_irq(>->irq_lock); - if (!guc->interrupts.enabled) { - WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) & -gt->pm_guc_events); - guc->interrupts.enabled = true; - gen6_gt_pm_enable_irq(gt, gt->pm_guc_events); - } + WARN_ON_ONCE(intel_uncore_read(gt->uncore, GEN8_GT_IIR(2)) & +gt->pm_guc_events); + gen6_gt_pm_enable_irq(gt, gt->pm_guc_events); spin_unlock_irq(>->irq_lock); } @@ -112,7 +109,6 @@ static void gen9_disable_guc_interrupts(struct intel_guc *guc) assert_rpm_wakelock_held(>->i915->runtime_pm); spin_lock_irq(>->irq_lock); - guc->interrupts.enabled = false; gen6_gt_pm_disable_irq(gt, gt->pm_guc_events); @@ -134,18 +130,14 @@ static void gen11_reset_guc_interrupts(struct intel_guc *guc) static void gen11_enable_guc_interrupts(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); + u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST); spin_lock_irq(>->irq_lock); - if (!guc->interrupts.enabled) { - u32 events = REG_FIELD_PREP(ENGINE1_MASK, GUC_INTR_GUC2HOST); - - WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC)); - intel_uncore_write(gt->uncore, - GEN11_GUC_SG_INTR_ENABLE, events); - intel_uncore_write(gt->uncore, - GEN11_GUC_SG_INTR_MASK, ~events); - guc->interrupts.enabled = true; - } + WARN_ON_ONCE(gen11_gt_reset_one_iir(gt, 0, GEN11_GUC)); + intel_uncore_write(gt->uncore, + GEN11_GUC_SG_INTR_ENABLE, events); + intel_uncore_write(gt->uncore, + GEN11_GUC_SG_INTR_MASK, ~events); spin_unlock_irq(>->irq_lock); } @@ -154,7 +146,6 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc) struct intel_gt *gt = guc_to_gt(guc); spin_lock_irq(>->irq_lock); - guc->interrupts.enabled = false; intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_MASK, ~0); intel_uncore_write(gt->uncore, GEN11_GUC_SG_INTR_ENABLE, 0); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index c20f3839de12..4abc59f6f3cd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -33,7 +33,6 @@ struct intel_guc { unsigned int msg_enabled_mask; struct { - bool enabled; void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc); void (*disable)(struct intel_guc *guc);
Re: [Intel-gfx] [RFC PATCH 60/97] drm/i915: Track 'serial' counts for virtual engines
On 5/27/2021 01:53, Tvrtko Ursulin wrote: On 26/05/2021 19:45, John Harrison wrote: On 5/26/2021 01:40, Tvrtko Ursulin wrote: On 25/05/2021 18:52, Matthew Brost wrote: On Tue, May 25, 2021 at 11:16:12AM +0100, Tvrtko Ursulin wrote: On 06/05/2021 20:14, Matthew Brost wrote: From: John Harrison The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing. This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions. Commit text sounds a bit defeatist. I think instead of making up the serial counts, which has downsides (could you please document in the commit what they are), we should think how to design things properly. IMO, I don't think fixing serial counts is the scope of this series. We should focus on getting GuC submission in not cleaning up all the crap that is in the i915. Let's make a note of this though so we can revisit later. I will say again - commit message implies it is introducing an unspecified downside by not fully fixing an also unspecified issue. It is completely reasonable, and customary even, to ask for both to be documented in the commit message. Not sure what exactly is 'unspecified'. I thought the commit message described both the problem (heartbeat not running when using virtual engines) and the result (heartbeat running on more engines than strictly necessary). But in greater detail... The serial number tracking is a hack for the heartbeat code to know whether an engine is busy or idle, and therefore whether it should be pinged for aliveness. Whenever a submission is made to an engine, the serial number is incremented. The heartbeat code keeps a copy of the value. If the value has changed, the engine is busy and needs to be pinged. This works fine for execlist mode where virtual engine decomposition is done inside i915. It fails miserably for GuC mode where the decomposition is done by the hardware. The reason being that the heartbeat code only looks at physical engines but the serial count is only incremented on the virtual engine. Thus, the heartbeat sees everything as idle and does not ping. So hangcheck does not work. Or it works because GuC does it anyway. Either way, that's one thing to explicitly state in the commit message. This patch decomposes the virtual engines for the sake of incrementing the serial count on each sub-engine in order to keep the heartbeat code happy. The downside is that now the heartbeat sees all sub-engines as busy rather than only the one the submission actually ends up on. There really isn't much that can be done about that. The heartbeat code is in i915 not GuC, the scheduler is in GuC not i915. The only way to improve it is to either move the heartbeat code into GuC as well and completely disable the i915 side, or add some way for i915 to interrogate GuC as to which engines are or are not active. Technically, we do have both. GuC has (or at least had) an option to force a context switch on every execution quantum pre-emption. However, that is much, much, more heavy weight than the heartbeat. For the latter, we do (almost) have the engine usage statistics for PMU and such like. I'm not sure how much effort it would be to wire that up to the heartbeat code instead of using the serial count. In short, the serial count is ever so slightly inefficient in that it causes heartbeat pings on engines which are idle. On the other hand, it is way more efficient and simpler than the current alternatives. And the hack to make hangcheck work creates this inefficiency where heartbeats are sent to idle engines. Which is probably fine just needs to be explained. Does that answer the questions? With the two points I re-raise clearly explained, possibly even patch title changed, yeah. I am just wanting for it to be more easily obvious to patch reader what it is functionally about - not just what implementation details have been change but why as well. My understanding is that we don't explain every piece of code in minute detail in every checkin email that touches it. I thought my description was already pretty verbose. I've certainly seen way less informative checkins that apparently made it through review without issue. Regarding the problem statement, I thought this was fairly clear that the heart
Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions
On 5/27/21 9:41 AM, Tom Lendacky wrote: > On 5/27/21 8:02 AM, Christoph Hellwig wrote: >> On Wed, May 19, 2021 at 11:50:07AM -0700, Florian Fainelli wrote: >>> You convert this call site with swiotlb_init_io_tlb_mem() which did not >>> do the set_memory_decrypted()+memset(). Is this okay or should >>> swiotlb_init_io_tlb_mem() add an additional argument to do this >>> conditionally? >> >> The zeroing is useful and was missing before. I think having a clean >> state here is the right thing. >> >> Not sure about the set_memory_decrypted, swiotlb_update_mem_attributes >> kinda suggests it is too early to set the memory decrupted. >> >> Adding Tom who should now about all this. > > The reason for adding swiotlb_update_mem_attributes() was because having > the call to set_memory_decrypted() in swiotlb_init_with_tbl() triggered a > BUG_ON() related to interrupts not being enabled yet during boot. So that > call had to be delayed until interrupts were enabled. I pulled down and tested the patch set and booted with SME enabled. The following was seen during the boot: [0.134184] BUG: Bad page state in process swapper pfn:108002 [0.134196] page:(ptrval) refcount:0 mapcount:-128 mapping: index:0x0 pfn:0x108002 [0.134201] flags: 0x17c000(node=0|zone=2|lastcpupid=0x1f) [0.134208] raw: 0017c000 88847f355e28 88847f355e28 [0.134210] raw: 0001 ff7f [0.134212] page dumped because: nonzero mapcount [0.134213] Modules linked in: [0.134218] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc2-sos-custom #3 [0.134221] Hardware name: ... [0.134224] Call Trace: [0.134233] dump_stack+0x76/0x94 [0.134244] bad_page+0xa6/0xf0 [0.134252] __free_pages_ok+0x331/0x360 [0.134256] memblock_free_all+0x158/0x1c1 [0.134267] mem_init+0x1f/0x14c [0.134273] start_kernel+0x290/0x574 [0.134279] secondary_startup_64_no_verify+0xb0/0xbb I see this about 40 times during the boot, each with a different PFN. The system boots (which seemed odd), but I don't know if there will be side effects to this (I didn't stress the system). I modified the code to add a flag to not do the set_memory_decrypted(), as suggested by Florian, when invoked from swiotlb_init_with_tbl(), and that eliminated the bad page state BUG. Thanks, Tom > > Thanks, > Tom > >>
RE: [PATCH v2 2/2] drm/kmb: Do not report 0 (success) in case of error
This is already fixed in the patch from Zhen Lei. > -Original Message- > From: Christophe JAILLET > Sent: Wednesday, May 26, 2021 11:10 PM > To: Chrisanthus, Anitha ; Dea, Edmund J > ; airl...@linux.ie; dan...@ffwll.ch; > s...@ravnborg.org > Cc: dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; kernel- > janit...@vger.kernel.org; Christophe JAILLET > Subject: [PATCH v2 2/2] drm/kmb: Do not report 0 (success) in case of error > > 'ret' is known to be 0 at this point. > Reporting the error from the previous 'platform_get_irq()' call is likely, > so add the missing assignment. > > Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display") > Signed-off-by: Christophe JAILLET > --- > v2: New patch > --- > drivers/gpu/drm/kmb/kmb_drv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/kmb/kmb_drv.c > b/drivers/gpu/drm/kmb/kmb_drv.c > index fa28e42da460..d9e10ac9847c 100644 > --- a/drivers/gpu/drm/kmb/kmb_drv.c > +++ b/drivers/gpu/drm/kmb/kmb_drv.c > @@ -138,6 +138,7 @@ static int kmb_hw_init(struct drm_device *drm, > unsigned long flags) > irq_lcd = platform_get_irq(pdev, 0); > if (irq_lcd < 0) { > drm_err(&kmb->drm, "irq_lcd not found"); > + ret = irq_lcd; > goto setup_fail; > } > > -- > 2.30.2
[PATCH 29/29] drm/i915/gem: Roll all of context creation together
Now that we have the whole engine set and VM at context creation time, we can just assign those fields instead of creating first and handling the VM and engines later. This lets us avoid creating useless VMs and engine sets and lets us get rid of the complex VM setting code. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 159 ++ .../gpu/drm/i915/gem/selftests/mock_context.c | 33 ++-- 2 files changed, 64 insertions(+), 128 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index e6a6ead477ff4..502a2bd1a043e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1298,56 +1298,6 @@ static int __context_set_persistence(struct i915_gem_context *ctx, bool state) return 0; } -static struct i915_gem_context * -__create_context(struct drm_i915_private *i915, -const struct i915_gem_proto_context *pc) -{ - struct i915_gem_context *ctx; - struct i915_gem_engines *e; - int err; - int i; - - ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); - if (!ctx) - return ERR_PTR(-ENOMEM); - - kref_init(&ctx->ref); - ctx->i915 = i915; - ctx->sched = pc->sched; - mutex_init(&ctx->mutex); - INIT_LIST_HEAD(&ctx->link); - - spin_lock_init(&ctx->stale.lock); - INIT_LIST_HEAD(&ctx->stale.engines); - - mutex_init(&ctx->engines_mutex); - e = default_engines(ctx, pc->legacy_rcs_sseu); - if (IS_ERR(e)) { - err = PTR_ERR(e); - goto err_free; - } - RCU_INIT_POINTER(ctx->engines, e); - - INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); - mutex_init(&ctx->lut_mutex); - - /* NB: Mark all slices as needing a remap so that when the context first -* loads it will restore whatever remap state already exists. If there -* is no remap info, it will be a NOP. */ - ctx->remap_slice = ALL_L3_SLICES(i915); - - ctx->user_flags = pc->user_flags; - - for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++) - ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES; - - return ctx; - -err_free: - kfree(ctx); - return ERR_PTR(err); -} - static inline struct i915_gem_engines * __context_engines_await(const struct i915_gem_context *ctx, bool *user_engines) @@ -1391,86 +1341,77 @@ context_apply_all(struct i915_gem_context *ctx, i915_sw_fence_complete(&e->fence); } -static void __apply_ppgtt(struct intel_context *ce, void *vm) -{ - i915_vm_put(ce->vm); - ce->vm = i915_vm_get(vm); -} - -static struct i915_address_space * -__set_ppgtt(struct i915_gem_context *ctx, struct i915_address_space *vm) -{ - struct i915_address_space *old; - - old = rcu_replace_pointer(ctx->vm, - i915_vm_open(vm), - lockdep_is_held(&ctx->mutex)); - GEM_BUG_ON(old && i915_vm_is_4lvl(vm) != i915_vm_is_4lvl(old)); - - context_apply_all(ctx, __apply_ppgtt, vm); - - return old; -} - -static void __assign_ppgtt(struct i915_gem_context *ctx, - struct i915_address_space *vm) -{ - if (vm == rcu_access_pointer(ctx->vm)) - return; - - vm = __set_ppgtt(ctx, vm); - if (vm) - i915_vm_close(vm); -} - static struct i915_gem_context * i915_gem_create_context(struct drm_i915_private *i915, const struct i915_gem_proto_context *pc) { struct i915_gem_context *ctx; - int ret; + struct i915_gem_engines *e; + int err; + int i; - ctx = __create_context(i915, pc); - if (IS_ERR(ctx)) - return ctx; + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return ERR_PTR(-ENOMEM); - if (pc->vm) { - mutex_lock(&ctx->mutex); - __assign_ppgtt(ctx, pc->vm); - mutex_unlock(&ctx->mutex); - } + kref_init(&ctx->ref); + ctx->i915 = i915; + ctx->sched = pc->sched; + mutex_init(&ctx->mutex); + INIT_LIST_HEAD(&ctx->link); - if (pc->num_user_engines >= 0) { - struct i915_gem_engines *engines; + spin_lock_init(&ctx->stale.lock); + INIT_LIST_HEAD(&ctx->stale.engines); - engines = user_engines(ctx, pc->num_user_engines, - pc->user_engines); - if (IS_ERR(engines)) { - context_close(ctx); - return ERR_CAST(engines); - } + if (pc->vm) + RCU_INIT_POINTER(ctx->vm, i915_vm_open(pc->vm)); - mutex_lock(&ctx->engines_mutex); + mutex_init(&ctx->engines_mutex); + if (pc->num_user_engines >= 0) { i915
[PATCH 28/29] i915/gem/selftests: Assign the VM at context creation in igt_shared_ctx_exec
We want to delete __assign_ppgtt and, generally, stop setting the VM after context creation. This is the one place I could find in the selftests where we set a VM after the fact. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index aee5642818824..01f7615eb3a8a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -813,16 +813,12 @@ static int igt_shared_ctx_exec(void *arg) struct i915_gem_context *ctx; struct intel_context *ce; - ctx = kernel_context(i915, NULL); + ctx = kernel_context(i915, ctx_vm(parent)); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); goto out_test; } - mutex_lock(&ctx->mutex); - __assign_ppgtt(ctx, ctx_vm(parent)); - mutex_unlock(&ctx->mutex); - ce = i915_gem_context_get_engine(ctx, engine->legacy_idx); GEM_BUG_ON(IS_ERR(ce)); -- 2.31.1
[PATCH 26/29] drm/i915/gem: Don't allow changing the engine set on running contexts (v2)
When the APIs were added to manage the engine set on a GEM context directly from userspace, the questionable choice was made to allow changing the engine set on a context at any time. This is horribly racy and there's absolutely no reason why any userspace would want to do this outside of trying to exercise interesting race conditions. By removing support for CONTEXT_PARAM_ENGINES from ctx_setparam, we make it impossible to change the engine set after the context has been fully created. This doesn't yet let us delete all the deferred engine clean-up code as that's still used for handling the case where the client dies or calls GEM_CONTEXT_DESTROY while work is in flight. However, moving to an API where the engine set is effectively immutable gives us more options to potentially clean that code up a bit going forward. It also removes a whole class of ways in which a client can hurt itself or try to get around kernel context banning. v2 (Jason Ekstrand): - Expand the commit mesage Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 303 1 file changed, 303 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index a528c8f3354a0..e6a6ead477ff4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1819,305 +1819,6 @@ static int set_sseu(struct i915_gem_context *ctx, return ret; } -struct set_engines { - struct i915_gem_context *ctx; - struct i915_gem_engines *engines; -}; - -static int -set_engines__load_balance(struct i915_user_extension __user *base, void *data) -{ - struct i915_context_engines_load_balance __user *ext = - container_of_user(base, typeof(*ext), base); - const struct set_engines *set = data; - struct drm_i915_private *i915 = set->ctx->i915; - struct intel_engine_cs *stack[16]; - struct intel_engine_cs **siblings; - struct intel_context *ce; - struct intel_sseu null_sseu = {}; - u16 num_siblings, idx; - unsigned int n; - int err; - - if (!HAS_EXECLISTS(i915)) - return -ENODEV; - - if (intel_uc_uses_guc_submission(&i915->gt.uc)) - return -ENODEV; /* not implement yet */ - - if (get_user(idx, &ext->engine_index)) - return -EFAULT; - - if (idx >= set->engines->num_engines) { - drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n", - idx, set->engines->num_engines); - return -EINVAL; - } - - idx = array_index_nospec(idx, set->engines->num_engines); - if (set->engines->engines[idx]) { - drm_dbg(&i915->drm, - "Invalid placement[%d], already occupied\n", idx); - return -EEXIST; - } - - if (get_user(num_siblings, &ext->num_siblings)) - return -EFAULT; - - err = check_user_mbz(&ext->flags); - if (err) - return err; - - err = check_user_mbz(&ext->mbz64); - if (err) - return err; - - siblings = stack; - if (num_siblings > ARRAY_SIZE(stack)) { - siblings = kmalloc_array(num_siblings, -sizeof(*siblings), -GFP_KERNEL); - if (!siblings) - return -ENOMEM; - } - - for (n = 0; n < num_siblings; n++) { - struct i915_engine_class_instance ci; - - if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) { - err = -EFAULT; - goto out_siblings; - } - - siblings[n] = intel_engine_lookup_user(i915, - ci.engine_class, - ci.engine_instance); - if (!siblings[n]) { - drm_dbg(&i915->drm, - "Invalid sibling[%d]: { class:%d, inst:%d }\n", - n, ci.engine_class, ci.engine_instance); - err = -EINVAL; - goto out_siblings; - } - } - - ce = intel_execlists_create_virtual(siblings, n); - if (IS_ERR(ce)) { - err = PTR_ERR(ce); - goto out_siblings; - } - - intel_context_set_gem(ce, set->ctx, null_sseu); - - if (cmpxchg(&set->engines->engines[idx], NULL, ce)) { - intel_context_put(ce); - err = -EEXIST; - goto out_siblings; - } - -out_siblings: - if (siblings != stack) - kfree(siblings); - - return err; -} - -static int -set_engines__bond(struct i915_user_extension __user *base, void *data) -{ - struct i915_context_engines_bond __user *ext = - co
[PATCH 22/29] drm/i915/gem: Return an error ptr from context_lookup
We're about to start doing lazy context creation which means contexts get created in i915_gem_context_lookup and we may start having more errors than -ENOENT. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c| 12 ++-- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++-- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index d68c111bc824a..76662175e6980 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -2636,8 +2636,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, int ret = 0; ctx = i915_gem_context_lookup(file_priv, args->ctx_id); - if (!ctx) - return -ENOENT; + if (IS_ERR(ctx)) + return PTR_ERR(ctx); switch (args->param) { case I915_CONTEXT_PARAM_GTT_SIZE: @@ -2705,8 +2705,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, int ret; ctx = i915_gem_context_lookup(file_priv, args->ctx_id); - if (!ctx) - return -ENOENT; + if (IS_ERR(ctx)) + return PTR_ERR(ctx); ret = ctx_setparam(file_priv, ctx, args); @@ -2725,8 +2725,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, return -EINVAL; ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id); - if (!ctx) - return -ENOENT; + if (IS_ERR(ctx)) + return PTR_ERR(ctx); /* * We opt for unserialised reads here. This may result in tearing diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 7024adcd5cf15..de14b26f3b2d5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -739,8 +739,8 @@ static int eb_select_context(struct i915_execbuffer *eb) struct i915_gem_context *ctx; ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->rsvd1); - if (unlikely(!ctx)) - return -ENOENT; + if (unlikely(IS_ERR(ctx))) + return PTR_ERR(ctx); eb->gem_context = ctx; if (rcu_access_pointer(ctx->vm)) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index fee2342219da1..d7bd732ceacfc 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1868,7 +1868,7 @@ i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id) ctx = NULL; rcu_read_unlock(); - return ctx; + return ctx ? ctx : ERR_PTR(-ENOENT); } static inline struct i915_address_space * diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index de8ebc34af0ff..dfc2a5c067c29 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3414,10 +3414,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf, struct drm_i915_file_private *file_priv = file->driver_priv; specific_ctx = i915_gem_context_lookup(file_priv, ctx_handle); - if (!specific_ctx) { + if (IS_ERR(specific_ctx)) { DRM_DEBUG("Failed to look up context with ID %u for opening perf stream\n", ctx_handle); - ret = -ENOENT; + ret = PTR_ERR(specific_ctx); goto err; } } -- 2.31.1
[PATCH 27/29] drm/i915/selftests: Take a VM in kernel_context()
This better models where we want to go with contexts in general where things like the VM and engine set are create parameters instead of being set after the fact. Signed-off-by: Jason Ekstrand --- .../drm/i915/gem/selftests/i915_gem_context.c | 4 ++-- .../gpu/drm/i915/gem/selftests/mock_context.c | 9 - .../gpu/drm/i915/gem/selftests/mock_context.h | 4 +++- drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +-- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 +- 5 files changed, 24 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 506cd9e9d4b25..aee5642818824 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -680,7 +680,7 @@ static int igt_ctx_exec(void *arg) struct i915_gem_context *ctx; struct intel_context *ce; - ctx = kernel_context(i915); + ctx = kernel_context(i915, NULL); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); goto out_file; @@ -813,7 +813,7 @@ static int igt_shared_ctx_exec(void *arg) struct i915_gem_context *ctx; struct intel_context *ce; - ctx = kernel_context(i915); + ctx = kernel_context(i915, NULL); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); goto out_test; diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c index 61aaac4a334cf..500ef27ba4771 100644 --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c @@ -150,7 +150,8 @@ live_context_for_engine(struct intel_engine_cs *engine, struct file *file) } struct i915_gem_context * -kernel_context(struct drm_i915_private *i915) +kernel_context(struct drm_i915_private *i915, + struct i915_address_space *vm) { struct i915_gem_context *ctx; struct i915_gem_proto_context *pc; @@ -159,6 +160,12 @@ kernel_context(struct drm_i915_private *i915) if (IS_ERR(pc)) return ERR_CAST(pc); + if (vm) { + if (pc->vm) + i915_vm_put(pc->vm); + pc->vm = i915_vm_get(vm); + } + ctx = i915_gem_create_context(i915, pc); proto_context_close(pc); if (IS_ERR(ctx)) diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.h b/drivers/gpu/drm/i915/gem/selftests/mock_context.h index 2a6121d33352d..7a02fd9b5866a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.h +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.h @@ -10,6 +10,7 @@ struct file; struct drm_i915_private; struct intel_engine_cs; +struct i915_address_space; void mock_init_contexts(struct drm_i915_private *i915); @@ -25,7 +26,8 @@ live_context(struct drm_i915_private *i915, struct file *file); struct i915_gem_context * live_context_for_engine(struct intel_engine_cs *engine, struct file *file); -struct i915_gem_context *kernel_context(struct drm_i915_private *i915); +struct i915_gem_context *kernel_context(struct drm_i915_private *i915, + struct i915_address_space *vm); void kernel_context_close(struct i915_gem_context *ctx); #endif /* !__MOCK_CONTEXT_H */ diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index a0e75b71a3374..0989e024f7a03 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -1522,12 +1522,12 @@ static int live_busywait_preempt(void *arg) * preempt the busywaits used to synchronise between rings. */ - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) return -ENOMEM; ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY; @@ -1724,12 +1724,12 @@ static int live_preempt(void *arg) if (igt_spinner_init(&spin_lo, gt)) goto err_spin_hi; - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) goto err_spin_lo; ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIO
[PATCH 08/29] drm/i915: Drop getparam support for I915_CONTEXT_PARAM_ENGINES
This has never been used by any userspace except IGT and provides no real functionality beyond parroting back parameters userspace passed in as part of context creation or via setparam. If the context is in legacy mode (where you use I915_EXEC_RENDER and friends), it returns success with zero data so it's not useful for discovering what engines are in the context. It's also not a replacement for the recently removed I915_CONTEXT_CLONE_ENGINES because it doesn't return any of the balancing or bonding information. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 77 + 1 file changed, 1 insertion(+), 76 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index aa792c9517e16..fed3538de9241 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1724,78 +1724,6 @@ set_engines(struct i915_gem_context *ctx, return 0; } -static int -get_engines(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - struct i915_context_param_engines __user *user; - struct i915_gem_engines *e; - size_t n, count, size; - bool user_engines; - int err = 0; - - e = __context_engines_await(ctx, &user_engines); - if (!e) - return -ENOENT; - - if (!user_engines) { - i915_sw_fence_complete(&e->fence); - args->size = 0; - return 0; - } - - count = e->num_engines; - - /* Be paranoid in case we have an impedance mismatch */ - if (!check_struct_size(user, engines, count, &size)) { - err = -EINVAL; - goto err_free; - } - if (overflows_type(size, args->size)) { - err = -EINVAL; - goto err_free; - } - - if (!args->size) { - args->size = size; - goto err_free; - } - - if (args->size < size) { - err = -EINVAL; - goto err_free; - } - - user = u64_to_user_ptr(args->value); - if (put_user(0, &user->extensions)) { - err = -EFAULT; - goto err_free; - } - - for (n = 0; n < count; n++) { - struct i915_engine_class_instance ci = { - .engine_class = I915_ENGINE_CLASS_INVALID, - .engine_instance = I915_ENGINE_CLASS_INVALID_NONE, - }; - - if (e->engines[n]) { - ci.engine_class = e->engines[n]->engine->uabi_class; - ci.engine_instance = e->engines[n]->engine->uabi_instance; - } - - if (copy_to_user(&user->engines[n], &ci, sizeof(ci))) { - err = -EFAULT; - goto err_free; - } - } - - args->size = size; - -err_free: - i915_sw_fence_complete(&e->fence); - return err; -} - static int set_persistence(struct i915_gem_context *ctx, const struct drm_i915_gem_context_param *args) @@ -2126,10 +2054,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, ret = get_ppgtt(file_priv, ctx, args); break; - case I915_CONTEXT_PARAM_ENGINES: - ret = get_engines(ctx, args); - break; - case I915_CONTEXT_PARAM_PERSISTENCE: args->size = 0; args->value = i915_gem_context_is_persistent(ctx); @@ -2137,6 +2061,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: + case I915_CONTEXT_PARAM_ENGINES: case I915_CONTEXT_PARAM_RINGSIZE: default: ret = -EINVAL; -- 2.31.1
[PATCH 25/29] drm/i915/gem: Don't allow changing the VM on running contexts (v2)
When the APIs were added to manage VMs more directly from userspace, the questionable choice was made to allow changing out the VM on a context at any time. This is horribly racy and there's absolutely no reason why any userspace would want to do this outside of testing that exact race. By removing support for CONTEXT_PARAM_VM from ctx_setparam, we make it impossible to change out the VM after the context has been fully created. This lets us delete a bunch of deferred task code as well as a duplicated (and slightly different) copy of the code which programs the PPGTT registers. v2 (Jason Ekstrand): - Expand the commit message Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 262 -- .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 +- .../drm/i915/gem/selftests/i915_gem_context.c | 119 .../drm/i915/selftests/i915_mock_selftests.h | 1 - 4 files changed, 1 insertion(+), 383 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index f7c83730ee07f..a528c8f3354a0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1633,120 +1633,6 @@ int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data, return 0; } -struct context_barrier_task { - struct i915_active base; - void (*task)(void *data); - void *data; -}; - -static void cb_retire(struct i915_active *base) -{ - struct context_barrier_task *cb = container_of(base, typeof(*cb), base); - - if (cb->task) - cb->task(cb->data); - - i915_active_fini(&cb->base); - kfree(cb); -} - -I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault); -static int context_barrier_task(struct i915_gem_context *ctx, - intel_engine_mask_t engines, - bool (*skip)(struct intel_context *ce, void *data), - int (*pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void *data), - int (*emit)(struct i915_request *rq, void *data), - void (*task)(void *data), - void *data) -{ - struct context_barrier_task *cb; - struct i915_gem_engines_iter it; - struct i915_gem_engines *e; - struct i915_gem_ww_ctx ww; - struct intel_context *ce; - int err = 0; - - GEM_BUG_ON(!task); - - cb = kmalloc(sizeof(*cb), GFP_KERNEL); - if (!cb) - return -ENOMEM; - - i915_active_init(&cb->base, NULL, cb_retire, 0); - err = i915_active_acquire(&cb->base); - if (err) { - kfree(cb); - return err; - } - - e = __context_engines_await(ctx, NULL); - if (!e) { - i915_active_release(&cb->base); - return -ENOENT; - } - - for_each_gem_engine(ce, e, it) { - struct i915_request *rq; - - if (I915_SELFTEST_ONLY(context_barrier_inject_fault & - ce->engine->mask)) { - err = -ENXIO; - break; - } - - if (!(ce->engine->mask & engines)) - continue; - - if (skip && skip(ce, data)) - continue; - - i915_gem_ww_ctx_init(&ww, true); -retry: - err = intel_context_pin_ww(ce, &ww); - if (err) - goto err; - - if (pin) - err = pin(ce, &ww, data); - if (err) - goto err_unpin; - - rq = i915_request_create(ce); - if (IS_ERR(rq)) { - err = PTR_ERR(rq); - goto err_unpin; - } - - err = 0; - if (emit) - err = emit(rq, data); - if (err == 0) - err = i915_active_add_request(&cb->base, rq); - - i915_request_add(rq); -err_unpin: - intel_context_unpin(ce); -err: - if (err == -EDEADLK) { - err = i915_gem_ww_ctx_backoff(&ww); - if (!err) - goto retry; - } - i915_gem_ww_ctx_fini(&ww); - - if (err) - break; - } - i915_sw_fence_complete(&e->fence); - - cb->task = err ? NULL : task; /* caller needs to unwind instead */ - cb->data = data; - - i915_active_release(&cb->base); - - return err; -} - static int get_ppgtt(struct drm_i915_file_private *file_priv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) @@ -1779,150 +1665,6 @@ static int get_ppgtt(struct drm_i
[PATCH 24/29] drm/i915/gem: Delay context creation
The current context uAPI allows for two methods of setting context parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The former is allowed to be called at any time while the later happens as part of GEM_CONTEXT_CREATE. Currently, everything settable via one is settable via the other. While some params are fairly simple and setting them on a live context is harmless such the context priority, others are far trickier such as the VM or the set of engines. In order to swap out the VM, for instance, we have to delay until all current in-flight work is complete, swap in the new VM, and then continue. This leads to a plethora of potential race conditions we'd really rather avoid. In previous patches, we added a i915_gem_proto_context struct which is capable of storing and tracking all such create parameters. This commit delays the creation of the actual context until after the client is done configuring it with SET_CONTEXT_PARAM. From the perspective of the client, it has the same u32 context ID the whole time. From the perspective of i915, however, it's an i915_gem_proto_context right up until the point where we attempt to do something which the proto-context can't handle at which point the real context gets created. This is accomplished via a little xarray dance. When GEM_CONTEXT_CREATE is called, we create a proto-context, reserve a slot in context_xa but leave it NULL, the proto-context in the corresponding slot in proto_context_xa. Then, whenever we go to look up a context, we first check context_xa. If it's there, we return the i915_gem_context and we're done. If it's not, we look in proto_context_xa and, if we find it there, we create the actual context and kill the proto-context. In order for this dance to work properly, everything which ever touches a proto-context is guarded by drm_i915_file_private::proto_context_lock, including context creation. Yes, this means context creation now takes a giant global lock but it can't really be helped and that should never be on any driver's fast-path anyway. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 211 ++ drivers/gpu/drm/i915/gem/i915_gem_context.h | 3 + .../gpu/drm/i915/gem/i915_gem_context_types.h | 54 + .../gpu/drm/i915/gem/selftests/mock_context.c | 5 +- drivers/gpu/drm/i915/i915_drv.h | 24 +- 5 files changed, 239 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 8288af0d33245..f7c83730ee07f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -298,6 +298,42 @@ proto_context_create(struct drm_i915_private *i915, unsigned int flags) return err; } +static int proto_context_register_locked(struct drm_i915_file_private *fpriv, +struct i915_gem_proto_context *pc, +u32 *id) +{ + int ret; + void *old; + + lockdep_assert_held(&fpriv->proto_context_lock); + + ret = xa_alloc(&fpriv->context_xa, id, NULL, xa_limit_32b, GFP_KERNEL); + if (ret) + return ret; + + old = xa_store(&fpriv->proto_context_xa, *id, pc, GFP_KERNEL); + if (xa_is_err(old)) { + xa_erase(&fpriv->context_xa, *id); + return xa_err(old); + } + GEM_BUG_ON(old); + + return 0; +} + +static int proto_context_register(struct drm_i915_file_private *fpriv, + struct i915_gem_proto_context *pc, + u32 *id) +{ + int ret; + + mutex_lock(&fpriv->proto_context_lock); + ret = proto_context_register_locked(fpriv, pc, id); + mutex_unlock(&fpriv->proto_context_lock); + + return ret; +} + static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv, struct i915_gem_proto_context *pc, const struct drm_i915_gem_context_param *args) @@ -1448,12 +1484,12 @@ void i915_gem_init__contexts(struct drm_i915_private *i915) init_contexts(&i915->gem.contexts); } -static int gem_context_register(struct i915_gem_context *ctx, - struct drm_i915_file_private *fpriv, - u32 *id) +static void gem_context_register(struct i915_gem_context *ctx, +struct drm_i915_file_private *fpriv, +u32 id) { struct drm_i915_private *i915 = ctx->i915; - int ret; + void *old; ctx->file_priv = fpriv; @@ -1462,19 +1498,12 @@ static int gem_context_register(struct i915_gem_context *ctx, current->comm, pid_nr(ctx->pid)); /* And finally expose ourselves to userspace via the idr */ - ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL); - if (
[PATCH 23/29] drm/i915/gt: Drop i915_address_space::file (v2)
There's a big comment saying how useful it is but no one is using this for anything anymore. It was added in 2bfa996e031b ("drm/i915: Store owning file on the i915_address_space") and used for debugfs at the time as well as telling the difference between the global GTT and a PPGTT. In f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs") we removed one use of it by switching to a context walk and comparing with the VM in the context. Finally, VM stats for debugfs were entirely nuked in db80a1294c23 ("drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects") v2 (Daniel Vetter): - Delete a struct drm_i915_file_private pre-declaration - Add a comment to the commit message about history Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 9 - drivers/gpu/drm/i915/gt/intel_gtt.h | 11 --- drivers/gpu/drm/i915/selftests/mock_gtt.c | 1 - 3 files changed, 21 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 76662175e6980..8288af0d33245 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1453,17 +1453,10 @@ static int gem_context_register(struct i915_gem_context *ctx, u32 *id) { struct drm_i915_private *i915 = ctx->i915; - struct i915_address_space *vm; int ret; ctx->file_priv = fpriv; - mutex_lock(&ctx->mutex); - vm = i915_gem_context_vm(ctx); - if (vm) - WRITE_ONCE(vm->file, fpriv); /* XXX */ - mutex_unlock(&ctx->mutex); - ctx->pid = get_task_pid(current, PIDTYPE_PID); snprintf(ctx->name, sizeof(ctx->name), "%s[%d]", current->comm, pid_nr(ctx->pid)); @@ -1562,8 +1555,6 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data, if (IS_ERR(ppgtt)) return PTR_ERR(ppgtt); - ppgtt->vm.file = file_priv; - if (args->extensions) { err = i915_user_extensions(u64_to_user_ptr(args->extensions), NULL, 0, diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index ca00b45827b74..cbd89fded6f2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -140,7 +140,6 @@ typedef u64 gen8_pte_t; enum i915_cache_level; -struct drm_i915_file_private; struct drm_i915_gem_object; struct i915_fence_reg; struct i915_vma; @@ -220,16 +219,6 @@ struct i915_address_space { struct intel_gt *gt; struct drm_i915_private *i915; struct device *dma; - /* -* Every address space belongs to a struct file - except for the global -* GTT that is owned by the driver (and so @file is set to NULL). In -* principle, no information should leak from one context to another -* (or between files/processes etc) unless explicitly shared by the -* owner. Tracking the owner is important in order to free up per-file -* objects along with the file, to aide resource tracking, and to -* assign blame. -*/ - struct drm_i915_file_private *file; u64 total; /* size addr space maps (ex. 2GB for ggtt) */ u64 reserved; /* size addr space reserved */ diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index 5c7ae40bba634..cc047ec594f93 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -73,7 +73,6 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name) ppgtt->vm.gt = &i915->gt; ppgtt->vm.i915 = i915; ppgtt->vm.total = round_down(U64_MAX, PAGE_SIZE); - ppgtt->vm.file = ERR_PTR(-ENODEV); ppgtt->vm.dma = i915->drm.dev; i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT); -- 2.31.1
[PATCH 20/29] drm/i915/gem: Make an alignment check more sensible
What we really want to check is that size of the engines array, i.e. args->size - sizeof(*user) is divisible by the element size, i.e. sizeof(*user->engines) because that's what's required for computing the array length right below the check. However, we're currently not doing this and instead doing a compile-time check that sizeof(*user) is divisible by sizeof(*user->engines) and avoiding the subtraction. As far as I can tell, the only reason for the more confusing pair of checks is to avoid a single subtraction of a constant. The other thing the BUILD_BUG_ON might be trying to implicitly check is that offsetof(user->engines) == sizeof(*user) and we don't have any weird padding throwing us off. However, that's not the check it's doing and it's not even a reliable way to do that check. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 12a148ba421b6..cf7c281977a3e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1758,9 +1758,8 @@ set_engines(struct i915_gem_context *ctx, goto replace; } - BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->engines))); if (args->size < sizeof(*user) || - !IS_ALIGNED(args->size, sizeof(*user->engines))) { + !IS_ALIGNED(args->size - sizeof(*user), sizeof(*user->engines))) { drm_dbg(&i915->drm, "Invalid size for engine array: %d\n", args->size); return -EINVAL; -- 2.31.1
[PATCH 21/29] drm/i915/gem: Use the proto-context to handle create parameters (v2)
This means that the proto-context needs to grow support for engine configuration information as well as setparam logic. Fortunately, we'll be deleting a lot of setparam logic on the primary context shortly so it will hopefully balance out. There's an extra bit of fun here when it comes to setting SSEU and the way it interacts with PARAM_ENGINES. Unfortunately, thanks to SET_CONTEXT_PARAM and not being allowed to pick the order in which we handle certain parameters, we have think about those interactions. v2 (Daniel Vetter): - Add a proto_context_free_user_engines helper - Comment on SSEU in the commit message - Use proto_context_set_persistence in set_proto_ctx_param Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 552 +- .../gpu/drm/i915/gem/i915_gem_context_types.h | 58 ++ 2 files changed, 588 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index cf7c281977a3e..d68c111bc824a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -191,10 +191,24 @@ static int validate_priority(struct drm_i915_private *i915, return 0; } +static void proto_context_free_user_engines(struct i915_gem_proto_context *pc) +{ + int i; + + if (pc->user_engines) { + for (i = 0; i < pc->num_user_engines; i++) + kfree(pc->user_engines[i].siblings); + kfree(pc->user_engines); + } + pc->user_engines = NULL; + pc->num_user_engines = -1; +} + static void proto_context_close(struct i915_gem_proto_context *pc) { if (pc->vm) i915_vm_put(pc->vm); + proto_context_free_user_engines(pc); kfree(pc); } @@ -211,7 +225,7 @@ static int proto_context_set_persistence(struct drm_i915_private *i915, if (!i915->params.enable_hangcheck) return -EINVAL; - __set_bit(UCONTEXT_PERSISTENCE, &pc->user_flags); + pc->user_flags |= BIT(UCONTEXT_PERSISTENCE); } else { /* To cancel a context we use "preempt-to-idle" */ if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION)) @@ -233,7 +247,7 @@ static int proto_context_set_persistence(struct drm_i915_private *i915, if (!intel_has_reset_engine(&i915->gt)) return -ENODEV; - __clear_bit(UCONTEXT_PERSISTENCE, &pc->user_flags); + pc->user_flags &= ~BIT(UCONTEXT_PERSISTENCE); } return 0; @@ -248,6 +262,9 @@ proto_context_create(struct drm_i915_private *i915, unsigned int flags) if (!pc) return ERR_PTR(-ENOMEM); + pc->num_user_engines = -1; + pc->user_engines = NULL; + if (HAS_FULL_PPGTT(i915)) { struct i915_ppgtt *ppgtt; @@ -261,9 +278,8 @@ proto_context_create(struct drm_i915_private *i915, unsigned int flags) pc->vm = &ppgtt->vm; } - pc->user_flags = 0; - __set_bit(UCONTEXT_BANNABLE, &pc->user_flags); - __set_bit(UCONTEXT_RECOVERABLE, &pc->user_flags); + pc->user_flags = BIT(UCONTEXT_BANNABLE) | +BIT(UCONTEXT_RECOVERABLE); proto_context_set_persistence(i915, pc, true); pc->sched.priority = I915_PRIORITY_NORMAL; @@ -282,6 +298,429 @@ proto_context_create(struct drm_i915_private *i915, unsigned int flags) return err; } +static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv, + struct i915_gem_proto_context *pc, + const struct drm_i915_gem_context_param *args) +{ + struct i915_address_space *vm; + + if (args->size) + return -EINVAL; + + if (!pc->vm) + return -ENODEV; + + if (upper_32_bits(args->value)) + return -ENOENT; + + vm = i915_gem_vm_lookup(fpriv, args->value); + if (!vm) + return -ENOENT; + + i915_vm_put(pc->vm); + pc->vm = vm; + + return 0; +} + +struct set_proto_ctx_engines { + struct drm_i915_private *i915; + unsigned num_engines; + struct i915_gem_proto_engine *engines; +}; + +static int +set_proto_ctx_engines_balance(struct i915_user_extension __user *base, + void *data) +{ + struct i915_context_engines_load_balance __user *ext = + container_of_user(base, typeof(*ext), base); + const struct set_proto_ctx_engines *set = data; + struct drm_i915_private *i915 = set->i915; + struct intel_engine_cs **siblings; + u16 num_siblings, idx; + unsigned int n; + int err; + + if (!HAS_EXECLISTS(i915)) + return -ENODEV; + + if (intel_uc_uses_guc_submission(&i915->gt.uc)) + return -ENODEV; /* not implement yet */ + +
[PATCH 14/29] drm/i915/gem: Add a separate validate_priority helper
With the proto-context stuff added later in this series, we end up having to duplicate set_priority. This lets us avoid duplicating the validation logic. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 + 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 910d31cb043e9..fc471243aa769 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -169,6 +169,28 @@ lookup_user_engine(struct i915_gem_context *ctx, return i915_gem_context_get_engine(ctx, idx); } +static int validate_priority(struct drm_i915_private *i915, +const struct drm_i915_gem_context_param *args) +{ + s64 priority = args->value; + + if (args->size) + return -EINVAL; + + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY)) + return -ENODEV; + + if (priority > I915_CONTEXT_MAX_USER_PRIORITY || + priority < I915_CONTEXT_MIN_USER_PRIORITY) + return -EINVAL; + + if (priority > I915_CONTEXT_DEFAULT_PRIORITY && + !capable(CAP_SYS_NICE)) + return -EPERM; + + return 0; +} + static struct i915_address_space * context_get_vm_rcu(struct i915_gem_context *ctx) { @@ -1744,23 +1766,13 @@ static void __apply_priority(struct intel_context *ce, void *arg) static int set_priority(struct i915_gem_context *ctx, const struct drm_i915_gem_context_param *args) { - s64 priority = args->value; - - if (args->size) - return -EINVAL; - - if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY)) - return -ENODEV; - - if (priority > I915_CONTEXT_MAX_USER_PRIORITY || - priority < I915_CONTEXT_MIN_USER_PRIORITY) - return -EINVAL; + int err; - if (priority > I915_CONTEXT_DEFAULT_PRIORITY && - !capable(CAP_SYS_NICE)) - return -EPERM; + err = validate_priority(ctx->i915, args); + if (err) + return err; - ctx->sched.priority = priority; + ctx->sched.priority = args->value; context_apply_all(ctx, __apply_priority, ctx); return 0; -- 2.31.1
[PATCH 18/29] drm/i915/gem: Optionally set SSEU in intel_context_set_gem
For now this is a no-op because everyone passes in a null SSEU but it lets us get some of the error handling and selftest refactoring plumbed through. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 41 +++ .../gpu/drm/i915/gem/selftests/mock_context.c | 6 ++- 2 files changed, 36 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index f8f3f514b4265..d247fb223aac7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -320,9 +320,12 @@ context_get_vm_rcu(struct i915_gem_context *ctx) } while (1); } -static void intel_context_set_gem(struct intel_context *ce, - struct i915_gem_context *ctx) +static int intel_context_set_gem(struct intel_context *ce, +struct i915_gem_context *ctx, +struct intel_sseu sseu) { + int ret = 0; + GEM_BUG_ON(rcu_access_pointer(ce->gem_context)); RCU_INIT_POINTER(ce->gem_context, ctx); @@ -349,6 +352,12 @@ static void intel_context_set_gem(struct intel_context *ce, intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000); } + + /* A valid SSEU has no zero fields */ + if (sseu.slice_mask && !WARN_ON(ce->engine->class != RENDER_CLASS)) + ret = intel_context_reconfigure_sseu(ce, sseu); + + return ret; } static void __free_engines(struct i915_gem_engines *e, unsigned int count) @@ -416,7 +425,8 @@ static struct i915_gem_engines *alloc_engines(unsigned int count) return e; } -static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) +static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx, + struct intel_sseu rcs_sseu) { const struct intel_gt *gt = &ctx->i915->gt; struct intel_engine_cs *engine; @@ -429,6 +439,8 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) for_each_engine(engine, gt, id) { struct intel_context *ce; + struct intel_sseu sseu = {}; + int ret; if (engine->legacy_idx == INVALID_ENGINE) continue; @@ -442,10 +454,18 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) goto free_engines; } - intel_context_set_gem(ce, ctx); - e->engines[engine->legacy_idx] = ce; e->num_engines = max(e->num_engines, engine->legacy_idx + 1); + + if (engine->class == RENDER_CLASS) + sseu = rcs_sseu; + + ret = intel_context_set_gem(ce, ctx, sseu); + if (ret) { + err = ERR_PTR(ret); + goto free_engines; + } + } return e; @@ -759,6 +779,7 @@ __create_context(struct drm_i915_private *i915, { struct i915_gem_context *ctx; struct i915_gem_engines *e; + struct intel_sseu null_sseu = {}; int err; int i; @@ -776,7 +797,7 @@ __create_context(struct drm_i915_private *i915, INIT_LIST_HEAD(&ctx->stale.engines); mutex_init(&ctx->engines_mutex); - e = default_engines(ctx); + e = default_engines(ctx, null_sseu); if (IS_ERR(e)) { err = PTR_ERR(e); goto err_free; @@ -1543,6 +1564,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) struct intel_engine_cs *stack[16]; struct intel_engine_cs **siblings; struct intel_context *ce; + struct intel_sseu null_sseu = {}; u16 num_siblings, idx; unsigned int n; int err; @@ -1615,7 +1637,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) goto out_siblings; } - intel_context_set_gem(ce, set->ctx); + intel_context_set_gem(ce, set->ctx, null_sseu); if (cmpxchg(&set->engines->engines[idx], NULL, ce)) { intel_context_put(ce); @@ -1723,6 +1745,7 @@ set_engines(struct i915_gem_context *ctx, struct drm_i915_private *i915 = ctx->i915; struct i915_context_param_engines __user *user = u64_to_user_ptr(args->value); + struct intel_sseu null_sseu = {}; struct set_engines set = { .ctx = ctx }; unsigned int num_engines, n; u64 extensions; @@ -1732,7 +1755,7 @@ set_engines(struct i915_gem_context *ctx, if (!i915_gem_context_user_engines(ctx)) return 0; - set.engines = default_engines(ctx); + set.engines = default_engines(ctx, null_sseu); if (IS_ERR(set.engines))
[PATCH 09/29] drm/i915/gem: Disallow bonding of virtual engines (v3)
This adds a bunch of complexity which the media driver has never actually used. The media driver does technically bond a balanced engine to another engine but the balanced engine only has one engine in the sibling set. This doesn't actually result in a virtual engine. This functionality was originally added to handle cases where we may have more than two video engines and media might want to load-balance their bonded submits by, for instance, submitting to a balanced vcs0-1 as the primary and then vcs2-3 as the secondary. However, no such hardware has shipped thus far and, if we ever want to enable such use-cases in the future, we'll use the up-and-coming parallel submit API which targets GuC submission. This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op. We leave the validation code in place in case we ever decide we want to do something interesting with the bonding information. v2 (Jason Ekstrand): - Don't delete quite as much code. v3 (Tvrtko Ursulin): - Add some history to the commit message Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 18 +- .../drm/i915/gt/intel_execlists_submission.c | 69 -- .../drm/i915/gt/intel_execlists_submission.h | 4 - drivers/gpu/drm/i915/gt/selftest_execlists.c | 229 -- 4 files changed, 6 insertions(+), 314 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index fed3538de9241..5e159fb526631 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1552,6 +1552,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) } virtual = set->engines->engines[idx]->engine; + if (intel_engine_is_virtual(virtual)) { + drm_dbg(&i915->drm, + "Bonding with virtual engines not allowed\n"); + return -EINVAL; + } + err = check_user_mbz(&ext->flags); if (err) return err; @@ -1592,18 +1598,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) n, ci.engine_class, ci.engine_instance); return -EINVAL; } - - /* -* A non-virtual engine has no siblings to choose between; and -* a submit fence will always be directed to the one engine. -*/ - if (intel_engine_is_virtual(virtual)) { - err = intel_virtual_engine_attach_bond(virtual, - master, - bond); - if (err) - return err; - } } return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 0e8c320927d15..14378b28169b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -181,18 +181,6 @@ struct virtual_engine { int prio; } nodes[I915_NUM_ENGINES]; - /* -* Keep track of bonded pairs -- restrictions upon on our selection -* of physical engines any particular request may be submitted to. -* If we receive a submit-fence from a master engine, we will only -* use one of sibling_mask physical engines. -*/ - struct ve_bond { - const struct intel_engine_cs *master; - intel_engine_mask_t sibling_mask; - } *bonds; - unsigned int num_bonds; - /* And finally, which physical engines this virtual engine maps onto. */ unsigned int num_siblings; struct intel_engine_cs *siblings[]; @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) intel_breadcrumbs_free(ve->base.breadcrumbs); intel_engine_free_request_pool(&ve->base); - kfree(ve->bonds); kfree(ve); } @@ -3560,33 +3547,13 @@ static void virtual_submit_request(struct i915_request *rq) spin_unlock_irqrestore(&ve->base.active.lock, flags); } -static struct ve_bond * -virtual_find_bond(struct virtual_engine *ve, - const struct intel_engine_cs *master) -{ - int i; - - for (i = 0; i < ve->num_bonds; i++) { - if (ve->bonds[i].master == master) - return &ve->bonds[i]; - } - - return NULL; -} - static void virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) { - struct virtual_engine *ve = to_virtual_engine(rq->engine); intel_engine_mask_t allowed, exec; - struct ve_bond *bond; allowed = ~to_request(signal)->engine->mask; - bond = virtual_find_bond(ve, to_request(signal)->en
[PATCH 12/29] drm/i915/gem: Disallow creating contexts with too many engines
There's no sense in allowing userspace to create more engines than it can possibly access via execbuf. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 5e159fb526631..2b9207b557cc9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1639,11 +1639,11 @@ set_engines(struct i915_gem_context *ctx, return -EINVAL; } - /* -* Note that I915_EXEC_RING_MASK limits execbuf to only using the -* first 64 engines defined here. -*/ num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines); + /* RING_MASK has no shift so we can use it directly here */ + if (num_engines > I915_EXEC_RING_MASK + 1) + return -EINVAL; + set.engines = alloc_engines(num_engines); if (!set.engines) return -ENOMEM; -- 2.31.1
[PATCH 11/29] drm/i915/request: Remove the hook from await_execution
This was only ever used for FENCE_SUBMIT automatic engine selection which was removed in the previous commit. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 3 +- drivers/gpu/drm/i915/i915_request.c | 42 --- drivers/gpu/drm/i915/i915_request.h | 4 +- 3 files changed, 9 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index efb2fa3522a42..7024adcd5cf15 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -3473,8 +3473,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (in_fence) { if (args->flags & I915_EXEC_FENCE_SUBMIT) err = i915_request_await_execution(eb.request, - in_fence, - NULL); + in_fence); else err = i915_request_await_dma_fence(eb.request, in_fence); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 970d8f4986bbe..53f23ce40dd63 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -49,7 +49,6 @@ struct execute_cb { struct irq_work work; struct i915_sw_fence *fence; - void (*hook)(struct i915_request *rq, struct dma_fence *signal); struct i915_request *signal; }; @@ -180,17 +179,6 @@ static void irq_execute_cb(struct irq_work *wrk) kmem_cache_free(global.slab_execute_cbs, cb); } -static void irq_execute_cb_hook(struct irq_work *wrk) -{ - struct execute_cb *cb = container_of(wrk, typeof(*cb), work); - - cb->hook(container_of(cb->fence, struct i915_request, submit), -&cb->signal->fence); - i915_request_put(cb->signal); - - irq_execute_cb(wrk); -} - static __always_inline void __notify_execute_cb(struct i915_request *rq, bool (*fn)(struct irq_work *wrk)) { @@ -517,17 +505,12 @@ static bool __request_in_flight(const struct i915_request *signal) static int __await_execution(struct i915_request *rq, struct i915_request *signal, - void (*hook)(struct i915_request *rq, - struct dma_fence *signal), gfp_t gfp) { struct execute_cb *cb; - if (i915_request_is_active(signal)) { - if (hook) - hook(rq, &signal->fence); + if (i915_request_is_active(signal)) return 0; - } cb = kmem_cache_alloc(global.slab_execute_cbs, gfp); if (!cb) @@ -537,12 +520,6 @@ __await_execution(struct i915_request *rq, i915_sw_fence_await(cb->fence); init_irq_work(&cb->work, irq_execute_cb); - if (hook) { - cb->hook = hook; - cb->signal = i915_request_get(signal); - cb->work.func = irq_execute_cb_hook; - } - /* * Register the callback first, then see if the signaler is already * active. This ensures that if we race with the @@ -1253,7 +1230,7 @@ emit_semaphore_wait(struct i915_request *to, goto await_fence; /* Only submit our spinner after the signaler is running! */ - if (__await_execution(to, from, NULL, gfp)) + if (__await_execution(to, from, gfp)) goto await_fence; if (__emit_semaphore_wait(to, from, from->fence.seqno)) @@ -1284,16 +1261,14 @@ static int intel_timeline_sync_set_start(struct intel_timeline *tl, static int __i915_request_await_execution(struct i915_request *to, - struct i915_request *from, - void (*hook)(struct i915_request *rq, - struct dma_fence *signal)) + struct i915_request *from) { int err; GEM_BUG_ON(intel_context_is_barrier(from->context)); /* Submit both requests at the same time */ - err = __await_execution(to, from, hook, I915_FENCE_GFP); + err = __await_execution(to, from, I915_FENCE_GFP); if (err) return err; @@ -1406,9 +1381,7 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) int i915_request_await_execution(struct i915_request *rq, -struct dma_fence *fence, -void (*hook)(struct i915_request *rq, - struct dma_fence *signal)) +struct dma_fence *fence) { struct dma_fence **child = &fence; unsigned int nchild = 1; @@ -1441,8 +1414,7 @@ i915_request_await_ex
[PATCH 19/29] drm/i915: Add an i915_gem_vm_lookup helper
This is the VM equivalent of i915_gem_context_lookup. It's only used once in this patch but future patches will need to duplicate this lookup code so it's better to have it in a helper. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +- drivers/gpu/drm/i915/i915_drv.h | 14 ++ 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index d247fb223aac7..12a148ba421b6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1346,11 +1346,7 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, if (upper_32_bits(args->value)) return -ENOENT; - rcu_read_lock(); - vm = xa_load(&file_priv->vm_xa, args->value); - if (vm && !kref_get_unless_zero(&vm->ref)) - vm = NULL; - rcu_read_unlock(); + vm = i915_gem_vm_lookup(file_priv, args->value); if (!vm) return -ENOENT; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 48316d273af66..fee2342219da1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1871,6 +1871,20 @@ i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id) return ctx; } +static inline struct i915_address_space * +i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id) +{ + struct i915_address_space *vm; + + rcu_read_lock(); + vm = xa_load(&file_priv->vm_xa, id); + if (vm && !kref_get_unless_zero(&vm->ref)) + vm = NULL; + rcu_read_unlock(); + + return vm; +} + /* i915_gem_evict.c */ int __must_check i915_gem_evict_something(struct i915_address_space *vm, u64 min_size, u64 alignment, -- 2.31.1
[PATCH 17/29] drm/i915/gem: Rework error handling in default_engines
Since free_engines works for partially constructed engine sets, we can use the usual goto pattern. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 10bff488444b6..f8f3f514b4265 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -420,7 +420,7 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) { const struct intel_gt *gt = &ctx->i915->gt; struct intel_engine_cs *engine; - struct i915_gem_engines *e; + struct i915_gem_engines *e, *err; enum intel_engine_id id; e = alloc_engines(I915_NUM_ENGINES); @@ -438,18 +438,21 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) ce = intel_context_create(engine); if (IS_ERR(ce)) { - __free_engines(e, e->num_engines + 1); - return ERR_CAST(ce); + err = ERR_CAST(ce); + goto free_engines; } intel_context_set_gem(ce, ctx); e->engines[engine->legacy_idx] = ce; - e->num_engines = max(e->num_engines, engine->legacy_idx); + e->num_engines = max(e->num_engines, engine->legacy_idx + 1); } - e->num_engines++; return e; + +free_engines: + free_engines(e); + return err; } void i915_gem_context_release(struct kref *ref) -- 2.31.1
[PATCH 13/29] drm/i915: Stop manually RCU banging in reset_stats_ioctl (v2)
As far as I can tell, the only real reason for this is to avoid taking a reference to the i915_gem_context. The cost of those two atomics probably pales in comparison to the cost of the ioctl itself so we're really not buying ourselves anything here. We're about to make context lookup a tiny bit more complicated, so let's get rid of the one hand- rolled case. Some usermode drivers such as our Vulkan driver call GET_RESET_STATS on every execbuf so the perf here could theoretically be an issue. If this ever does become a performance issue for any such userspace drivers, they can use set CONTEXT_PARAM_RECOVERABLE to false and look for -EIO coming from execbuf to check for hangs instead. v2 (Daniel Vetter): - Add a comment in the commit message about recoverable contexts Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 - drivers/gpu/drm/i915/i915_drv.h | 8 +--- 2 files changed, 5 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 2b9207b557cc9..910d31cb043e9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, struct drm_i915_private *i915 = to_i915(dev); struct drm_i915_reset_stats *args = data; struct i915_gem_context *ctx; - int ret; if (args->flags || args->pad) return -EINVAL; - ret = -ENOENT; - rcu_read_lock(); - ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id); + ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id); if (!ctx) - goto out; + return -ENOENT; /* * We opt for unserialised reads here. This may result in tearing @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, args->batch_active = atomic_read(&ctx->guilty_count); args->batch_pending = atomic_read(&ctx->active_count); - ret = 0; -out: - rcu_read_unlock(); - return ret; + i915_gem_context_put(ctx); + return 0; } /* GEM context-engines iterator: for_each_gem_engine() */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 39b5e019c1a5b..48316d273af66 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1857,19 +1857,13 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags); -static inline struct i915_gem_context * -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id) -{ - return xa_load(&file_priv->context_xa, id); -} - static inline struct i915_gem_context * i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id) { struct i915_gem_context *ctx; rcu_read_lock(); - ctx = __i915_gem_context_lookup_rcu(file_priv, id); + ctx = xa_load(&file_priv->context_xa, id); if (ctx && !kref_get_unless_zero(&ctx->ref)) ctx = NULL; rcu_read_unlock(); -- 2.31.1
[PATCH 10/29] drm/i915/gem: Remove engine auto-magic with FENCE_SUBMIT (v2)
Even though FENCE_SUBMIT is only documented to wait until the request in the in-fence starts instead of waiting until it completes, it has a bit more magic than that. If FENCE_SUBMIT is used to submit something to a balanced engine, we would wait to assign engines until the primary request was ready to start and then attempt to assign it to a different engine than the primary. There is an IGT test (the bonded-slice subtest of gem_exec_balancer) which exercises this by submitting a primary batch to a specific VCS and then using FENCE_SUBMIT to submit a secondary which can run on any VCS and have i915 figure out which VCS to run it on such that they can run in parallel. However, this functionality has never been used in the real world. The media driver (the only user of FENCE_SUBMIT) always picks exactly two physical engines to bond and never asks us to pick which to use. v2 (Daniel Vetter): - Mention the exact IGT test this breaks Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h| 7 --- .../drm/i915/gt/intel_execlists_submission.c| 17 - 3 files changed, 1 insertion(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d640bba6ad9ab..efb2fa3522a42 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (args->flags & I915_EXEC_FENCE_SUBMIT) err = i915_request_await_execution(eb.request, in_fence, - eb.engine->bond_execute); + NULL); else err = i915_request_await_dma_fence(eb.request, in_fence); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 883bafc449024..68cfe5080325c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -446,13 +446,6 @@ struct intel_engine_cs { */ void(*submit_request)(struct i915_request *rq); - /* -* Called on signaling of a SUBMIT_FENCE, passing along the signaling -* request down to the bonded pairs. -*/ - void(*bond_execute)(struct i915_request *rq, - struct dma_fence *signal); - /* * Call when the priority on a request has changed and it and its * dependencies may need rescheduling. Note the request itself may diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 14378b28169b7..635d6d2494d26 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3547,22 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq) spin_unlock_irqrestore(&ve->base.active.lock, flags); } -static void -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) -{ - intel_engine_mask_t allowed, exec; - - allowed = ~to_request(signal)->engine->mask; - - /* Restrict the bonded request to run on only the available engines */ - exec = READ_ONCE(rq->execution_mask); - while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) - ; - - /* Prevent the master from being re-run on the bonded engines */ - to_request(signal)->execution_mask &= ~allowed; -} - struct intel_context * intel_execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) @@ -3616,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, ve->base.schedule = i915_schedule; ve->base.submit_request = virtual_submit_request; - ve->base.bond_execute = virtual_bond_execute; INIT_LIST_HEAD(virtual_queue(ve)); ve->base.execlists.queue_priority_hint = INT_MIN; -- 2.31.1
[PATCH 07/29] drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
This API is entirely unnecessary and I'd love to get rid of it. If userspace wants a single timeline across multiple contexts, they can either use implicit synchronization or a syncobj, both of which existed at the time this feature landed. The justification given at the time was that it would help GL drivers which are inherently single-timeline. However, neither of our GL drivers actually wanted the feature. i965 was already in maintenance mode at the time and iris uses syncobj for everything. Unfortunately, as much as I'd love to get rid of it, it is used by the media driver so we can't do that. We can, however, do the next-best thing which is to embed a syncobj in the context and do exactly what we'd expect from userspace internally. This isn't an entirely identical implementation because it's no longer atomic if userspace races with itself by calling execbuffer2 twice simultaneously from different threads. It won't crash in that case; it just doesn't guarantee any ordering between those two submits. It also means that sync files exported from different engines on a SINGLE_TIMELINE context will have different fence contexts. This is visible to userspace if it looks at the obj_name field of sync_fence_info. Moving SINGLE_TIMELINE to a syncobj emulation has a couple of technical advantages beyond mere annoyance. One is that intel_timeline is no longer an api-visible object and can remain entirely an implementation detail. This may be advantageous as we make scheduler changes going forward. Second is that, together with deleting the CLONE_CONTEXT API, we should now have a 1:1 mapping between intel_context and intel_timeline which may help us reduce locking. v2 (Tvrtko Ursulin): - Update the comment on i915_gem_context::syncobj to mention that it's an emulation and the possible race if userspace calls execbuffer2 twice on the same context concurrently. v2 (Jason Ekstrand): - Wrap the checks for eb.gem_context->syncobj in unlikely() - Drop the dma_fence reference - Improved commit message v3 (Jason Ekstrand): - Move the dma_fence_put() to before the error exit v4 (Tvrtko Ursulin): - Add a comment about fence contexts to the commit message Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 49 +-- .../gpu/drm/i915/gem/i915_gem_context_types.h | 14 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 16 ++ 3 files changed, 40 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 97613e529aab3..aa792c9517e16 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -67,6 +67,8 @@ #include #include +#include + #include "gt/gen6_ppgtt.h" #include "gt/intel_context.h" #include "gt/intel_context_param.h" @@ -224,10 +226,6 @@ static void intel_context_set_gem(struct intel_context *ce, ce->vm = vm; } - GEM_BUG_ON(ce->timeline); - if (ctx->timeline) - ce->timeline = intel_timeline_get(ctx->timeline); - if (ctx->sched.priority >= I915_PRIORITY_NORMAL && intel_engine_has_timeslices(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); @@ -351,9 +349,6 @@ void i915_gem_context_release(struct kref *ref) mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); - if (ctx->timeline) - intel_timeline_put(ctx->timeline); - put_pid(ctx->pid); mutex_destroy(&ctx->mutex); @@ -570,6 +565,9 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm); + if (ctx->syncobj) + drm_syncobj_put(ctx->syncobj); + ctx->file_priv = ERR_PTR(-EBADF); /* @@ -765,33 +763,11 @@ static void __assign_ppgtt(struct i915_gem_context *ctx, i915_vm_close(vm); } -static void __set_timeline(struct intel_timeline **dst, - struct intel_timeline *src) -{ - struct intel_timeline *old = *dst; - - *dst = src ? intel_timeline_get(src) : NULL; - - if (old) - intel_timeline_put(old); -} - -static void __apply_timeline(struct intel_context *ce, void *timeline) -{ - __set_timeline(&ce->timeline, timeline); -} - -static void __assign_timeline(struct i915_gem_context *ctx, - struct intel_timeline *timeline) -{ - __set_timeline(&ctx->timeline, timeline); - context_apply_all(ctx, __apply_timeline, timeline); -} - static struct i915_gem_context * i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) { struct i915_gem_context *ctx; + int ret; if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE && !HAS_EXECLISTS(i915)) @@ -820,16 +796,13 @@ i915_gem_create_context(struct drm_i915_private *
[PATCH 16/29] drm/i915/gem: Add an intermediate proto_context struct
The current context uAPI allows for two methods of setting context parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The former is allowed to be called at any time while the later happens as part of GEM_CONTEXT_CREATE. Currently, everything settable via one is settable via the other. While some params are fairly simple and setting them on a live context is harmless such the context priority, others are far trickier such as the VM or the set of engines. In order to swap out the VM, for instance, we have to delay until all current in-flight work is complete, swap in the new VM, and then continue. This leads to a plethora of potential race conditions we'd really rather avoid. Unfortunately, both methods of setting the VM and engine set are in active use today so we can't simply disallow setting the VM or engine set vial SET_CONTEXT_PARAM. In order to work around this wart, this commit adds a proto-context struct which contains all the context create parameters. Signed-off-by: Jason Ekstrand --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 145 ++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 22 +++ .../gpu/drm/i915/gem/selftests/mock_context.c | 16 +- 3 files changed, 153 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index fc471243aa769..10bff488444b6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -191,6 +191,97 @@ static int validate_priority(struct drm_i915_private *i915, return 0; } +static void proto_context_close(struct i915_gem_proto_context *pc) +{ + if (pc->vm) + i915_vm_put(pc->vm); + kfree(pc); +} + +static int proto_context_set_persistence(struct drm_i915_private *i915, +struct i915_gem_proto_context *pc, +bool persist) +{ + if (persist) { + /* +* Only contexts that are short-lived [that will expire or be +* reset] are allowed to survive past termination. We require +* hangcheck to ensure that the persistent requests are healthy. +*/ + if (!i915->params.enable_hangcheck) + return -EINVAL; + + __set_bit(UCONTEXT_PERSISTENCE, &pc->user_flags); + } else { + /* To cancel a context we use "preempt-to-idle" */ + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION)) + return -ENODEV; + + /* +* If the cancel fails, we then need to reset, cleanly! +* +* If the per-engine reset fails, all hope is lost! We resort +* to a full GPU reset in that unlikely case, but realistically +* if the engine could not reset, the full reset does not fare +* much better. The damage has been done. +* +* However, if we cannot reset an engine by itself, we cannot +* cleanup a hanging persistent context without causing +* colateral damage, and we should not pretend we can by +* exposing the interface. +*/ + if (!intel_has_reset_engine(&i915->gt)) + return -ENODEV; + + __clear_bit(UCONTEXT_PERSISTENCE, &pc->user_flags); + } + + return 0; +} + +static struct i915_gem_proto_context * +proto_context_create(struct drm_i915_private *i915, unsigned int flags) +{ + struct i915_gem_proto_context *pc, *err; + + pc = kzalloc(sizeof(*pc), GFP_KERNEL); + if (!pc) + return ERR_PTR(-ENOMEM); + + if (HAS_FULL_PPGTT(i915)) { + struct i915_ppgtt *ppgtt; + + ppgtt = i915_ppgtt_create(&i915->gt); + if (IS_ERR(ppgtt)) { + drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n", + PTR_ERR(ppgtt)); + err = ERR_CAST(ppgtt); + goto proto_close; + } + pc->vm = &ppgtt->vm; + } + + pc->user_flags = 0; + __set_bit(UCONTEXT_BANNABLE, &pc->user_flags); + __set_bit(UCONTEXT_RECOVERABLE, &pc->user_flags); + proto_context_set_persistence(i915, pc, true); + pc->sched.priority = I915_PRIORITY_NORMAL; + + if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) { + if (!HAS_EXECLISTS(i915)) { + err = ERR_PTR(-EINVAL); + goto proto_close; + } + pc->single_timeline = true; + } + + return pc; + +proto_close: + proto_context_close(pc); + return err; +} + static struct i915_address_space * context_get_vm_rcu(struct i915_gem_context *ctx) { @@ -660,7 +751,8 @@ s
[PATCH 15/29] drm/i915: Add gem/i915_gem_context.h to the docs
In order to prevent kernel doc warnings, also fill out docs for any missing fields and fix those that forgot the "@". Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- Documentation/gpu/i915.rst| 2 + .../gpu/drm/i915/gem/i915_gem_context_types.h | 43 --- 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 486c720f38907..0529e5183982e 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -422,6 +422,8 @@ Batchbuffer Parsing User Batchbuffer Execution -- +.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h + .. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c :doc: User command execution diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index df76767f0c41b..5f0673a2129f9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -30,19 +30,39 @@ struct i915_address_space; struct intel_timeline; struct intel_ring; +/** + * struct i915_gem_engines - A set of engines + */ struct i915_gem_engines { union { + /** @link: Link in i915_gem_context::stale::engines */ struct list_head link; + + /** @rcu: RCU to use when freeing */ struct rcu_head rcu; }; + + /** @fence: Fence used for delayed destruction of engines */ struct i915_sw_fence fence; + + /** @ctx: i915_gem_context backpointer */ struct i915_gem_context *ctx; + + /** @num_engines: Number of engines in this set */ unsigned int num_engines; + + /** @engines: Array of engines */ struct intel_context *engines[]; }; +/** + * struct i915_gem_engines_iter - Iterator for an i915_gem_engines set + */ struct i915_gem_engines_iter { + /** @idx: Index into i915_gem_engines::engines */ unsigned int idx; + + /** @engines: Engine set being iterated */ const struct i915_gem_engines *engines; }; @@ -53,10 +73,10 @@ struct i915_gem_engines_iter { * logical hardware state for a particular client. */ struct i915_gem_context { - /** i915: i915 device backpointer */ + /** @i915: i915 device backpointer */ struct drm_i915_private *i915; - /** file_priv: owning file descriptor */ + /** @file_priv: owning file descriptor */ struct drm_i915_file_private *file_priv; /** @@ -81,7 +101,9 @@ struct i915_gem_context { * CONTEXT_USER_ENGINES flag is set). */ struct i915_gem_engines __rcu *engines; - struct mutex engines_mutex; /* guards writes to engines */ + + /** @engines_mutex: guards writes to engines */ + struct mutex engines_mutex; /** * @syncobj: Shared timeline syncobj @@ -118,7 +140,7 @@ struct i915_gem_context { */ struct pid *pid; - /** link: place with &drm_i915_private.context_list */ + /** @link: place with &drm_i915_private.context_list */ struct list_head link; /** @@ -153,11 +175,13 @@ struct i915_gem_context { #define CONTEXT_CLOSED 0 #define CONTEXT_USER_ENGINES 1 + /** @mutex: guards everything that isn't engines or handles_vma */ struct mutex mutex; + /** @sched: scheduler parameters */ struct i915_sched_attr sched; - /** guilty_count: How many times this context has caused a GPU hang. */ + /** @guilty_count: How many times this context has caused a GPU hang. */ atomic_t guilty_count; /** * @active_count: How many times this context was active during a GPU @@ -171,15 +195,17 @@ struct i915_gem_context { unsigned long hang_timestamp[2]; #define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */ - /** remap_slice: Bitmask of cache lines that need remapping */ + /** @remap_slice: Bitmask of cache lines that need remapping */ u8 remap_slice; /** -* handles_vma: rbtree to look up our context specific obj/vma for +* @handles_vma: rbtree to look up our context specific obj/vma for * the user handle. (user handles are per fd, but the binding is * per vm, which may be one per context or shared with the global GTT) */ struct radix_tree_root handles_vma; + + /** @lut_mutex: Locks handles_vma */ struct mutex lut_mutex; /** @@ -191,8 +217,11 @@ struct i915_gem_context { */ char name[TASK_COMM_LEN + 8]; + /** @stale: tracks stale engines to be destroyed */ struct { + /** @lock: guards engines */ spinlock_t lock; + /** @engines: list of stale engines */ struct list_head engines; } stal
[PATCH 06/29] drm/i915: Drop the CONTEXT_CLONE API (v2)
This API allows one context to grab bits out of another context upon creation. It can be used as a short-cut for setparam(getparam()) for things like I915_CONTEXT_PARAM_VM. However, it's never been used by any real userspace. It's used by a few IGT tests and that's it. Since it doesn't add any real value (most of the stuff you can CLONE you can copy in other ways), drop it. There is one thing that this API allows you to clone which you cannot clone via getparam/setparam: timelines. However, timelines are an implementation detail of i915 and not really something that needs to be exposed to userspace. Also, sharing timelines between contexts isn't obviously useful and supporting it has the potential to complicate i915 internally. It also doesn't add any functionality that the client can't get in other ways. If a client really wants a shared timeline, they can use a syncobj and set it as an in and out fence on every submit. v2 (Jason Ekstrand): - More detailed commit message Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 199 +- .../drm/i915/gt/intel_execlists_submission.c | 28 --- .../drm/i915/gt/intel_execlists_submission.h | 3 - include/uapi/drm/i915_drm.h | 16 +- 4 files changed, 6 insertions(+), 240 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 6f1e5c2c5b113..97613e529aab3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1957,207 +1957,14 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data) return ctx_setparam(arg->fpriv, arg->ctx, &local.param); } -static int clone_engines(struct i915_gem_context *dst, -struct i915_gem_context *src) +static int invalid_ext(struct i915_user_extension __user *ext, void *data) { - struct i915_gem_engines *clone, *e; - bool user_engines; - unsigned long n; - - e = __context_engines_await(src, &user_engines); - if (!e) - return -ENOENT; - - clone = alloc_engines(e->num_engines); - if (!clone) - goto err_unlock; - - for (n = 0; n < e->num_engines; n++) { - struct intel_engine_cs *engine; - - if (!e->engines[n]) { - clone->engines[n] = NULL; - continue; - } - engine = e->engines[n]->engine; - - /* -* Virtual engines are singletons; they can only exist -* inside a single context, because they embed their -* HW context... As each virtual context implies a single -* timeline (each engine can only dequeue a single request -* at any time), it would be surprising for two contexts -* to use the same engine. So let's create a copy of -* the virtual engine instead. -*/ - if (intel_engine_is_virtual(engine)) - clone->engines[n] = - intel_execlists_clone_virtual(engine); - else - clone->engines[n] = intel_context_create(engine); - if (IS_ERR_OR_NULL(clone->engines[n])) { - __free_engines(clone, n); - goto err_unlock; - } - - intel_context_set_gem(clone->engines[n], dst); - } - clone->num_engines = n; - i915_sw_fence_complete(&e->fence); - - /* Serialised by constructor */ - engines_idle_release(dst, rcu_replace_pointer(dst->engines, clone, 1)); - if (user_engines) - i915_gem_context_set_user_engines(dst); - else - i915_gem_context_clear_user_engines(dst); - return 0; - -err_unlock: - i915_sw_fence_complete(&e->fence); - return -ENOMEM; -} - -static int clone_flags(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - dst->user_flags = src->user_flags; - return 0; -} - -static int clone_schedattr(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - dst->sched = src->sched; - return 0; -} - -static int clone_sseu(struct i915_gem_context *dst, - struct i915_gem_context *src) -{ - struct i915_gem_engines *e = i915_gem_context_lock_engines(src); - struct i915_gem_engines *clone; - unsigned long n; - int err; - - /* no locking required; sole access under constructor*/ - clone = __context_engines_static(dst); - if (e->num_engines != clone->num_engines) { - err = -EINVAL; - goto unlock; - } - - for (n = 0; n < e->num_engines; n++) { - struct intel_context *ce = e->engines[n]; - -
[PATCH 05/29] drm/i915/gem: Return void from context_apply_all
None of the callbacks we use with it return an error code anymore; they all return 0 unconditionally. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 26 +++-- 1 file changed, 8 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 9a8a96e4346e4..6f1e5c2c5b113 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -718,32 +718,25 @@ __context_engines_await(const struct i915_gem_context *ctx, return engines; } -static int +static void context_apply_all(struct i915_gem_context *ctx, - int (*fn)(struct intel_context *ce, void *data), + void (*fn)(struct intel_context *ce, void *data), void *data) { struct i915_gem_engines_iter it; struct i915_gem_engines *e; struct intel_context *ce; - int err = 0; e = __context_engines_await(ctx, NULL); - for_each_gem_engine(ce, e, it) { - err = fn(ce, data); - if (err) - break; - } + for_each_gem_engine(ce, e, it) + fn(ce, data); i915_sw_fence_complete(&e->fence); - - return err; } -static int __apply_ppgtt(struct intel_context *ce, void *vm) +static void __apply_ppgtt(struct intel_context *ce, void *vm) { i915_vm_put(ce->vm); ce->vm = i915_vm_get(vm); - return 0; } static struct i915_address_space * @@ -783,10 +776,9 @@ static void __set_timeline(struct intel_timeline **dst, intel_timeline_put(old); } -static int __apply_timeline(struct intel_context *ce, void *timeline) +static void __apply_timeline(struct intel_context *ce, void *timeline) { __set_timeline(&ce->timeline, timeline); - return 0; } static void __assign_timeline(struct i915_gem_context *ctx, @@ -1841,19 +1833,17 @@ set_persistence(struct i915_gem_context *ctx, return __context_set_persistence(ctx, args->value); } -static int __apply_priority(struct intel_context *ce, void *arg) +static void __apply_priority(struct intel_context *ce, void *arg) { struct i915_gem_context *ctx = arg; if (!intel_engine_has_timeslices(ce->engine)) - return 0; + return; if (ctx->sched.priority >= I915_PRIORITY_NORMAL) intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce); - - return 0; } static int set_priority(struct i915_gem_context *ctx, -- 2.31.1
[PATCH 04/29] drm/i915/gem: Set the watchdog timeout directly in intel_context_set_gem (v2)
Instead of handling it like a context param, unconditionally set it when intel_contexts are created. For years we've had the idea of a watchdog uAPI floating about. The aim was for media, so that they could set very tight deadlines for their transcodes jobs, so that if you have a corrupt bitstream (especially for decoding) you don't hang your desktop too hard. But it's been stuck in limbo since forever, and this simplifies things a bit in preparation for the proto-context work. If we decide to actually make said uAPI a reality, we can do it through the proto- context easily enough. This does mean that we move from reading the request_timeout_ms param once per engine when engines are created instead of once at context creation. If someone changes request_timeout_ms between creating a context and setting engines, it will mean that they get the new timeout. If someone races setting request_timeout_ms and context creation, they can theoretically end up with different timeouts. However, since both of these are fairly harmless and require changing kernel params, we don't care. v2 (Tvrtko Ursulin): - Add a comment about races with request_timeout_ms Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 44 +++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 4 -- drivers/gpu/drm/i915/gt/intel_context_param.h | 3 +- 3 files changed, 7 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 868c18c08a0b1..9a8a96e4346e4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -232,7 +232,12 @@ static void intel_context_set_gem(struct intel_context *ce, intel_engine_has_timeslices(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); - intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us); + if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) && + ctx->i915->params.request_timeout_ms) { + unsigned int timeout_ms = ctx->i915->params.request_timeout_ms; + + intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000); + } } static void __free_engines(struct i915_gem_engines *e, unsigned int count) @@ -791,41 +796,6 @@ static void __assign_timeline(struct i915_gem_context *ctx, context_apply_all(ctx, __apply_timeline, timeline); } -static int __apply_watchdog(struct intel_context *ce, void *timeout_us) -{ - return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us); -} - -static int -__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us) -{ - int ret; - - ret = context_apply_all(ctx, __apply_watchdog, - (void *)(uintptr_t)timeout_us); - if (!ret) - ctx->watchdog.timeout_us = timeout_us; - - return ret; -} - -static void __set_default_fence_expiry(struct i915_gem_context *ctx) -{ - struct drm_i915_private *i915 = ctx->i915; - int ret; - - if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) || - !i915->params.request_timeout_ms) - return; - - /* Default expiry for user fences. */ - ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000); - if (ret) - drm_notice(&i915->drm, - "Failed to configure default fence expiry! (%d)", - ret); -} - static struct i915_gem_context * i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) { @@ -870,8 +840,6 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) intel_timeline_put(timeline); } - __set_default_fence_expiry(ctx); - trace_i915_context_create(ctx); return ctx; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 5ae71ec936f7c..676592e27e7d2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -153,10 +153,6 @@ struct i915_gem_context { */ atomic_t active_count; - struct { - u64 timeout_us; - } watchdog; - /** * @hang_timestamp: The last time(s) this context caused a GPU hang */ diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h index dffedd983693d..0c69cb42d075c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_param.h +++ b/drivers/gpu/drm/i915/gt/intel_context_param.h @@ -10,11 +10,10 @@ #include "intel_context.h" -static inline int +static inline void intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us) { ce->watchdog.timeout_us = timeout_us; - return 0; } #endif /* INTEL_CONTEXT_PARAM_H */ -- 2.31.1
[PATCH 01/29] drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE
This reverts commit 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction"). This API was originally added for OpenCL but the compute-runtime PR has sat open for a year without action so we can still pull it out if we want. I argue we should drop it for three reasons: 1. If the compute-runtime PR has sat open for a year, this clearly isn't that important. 2. It's a very leaky API. Ring size is an implementation detail of the current execlist scheduler and really only makes sense there. It can't apply to the older ring-buffer scheduler on pre-execlist hardware because that's shared across all contexts and it won't apply to the GuC scheduler that's in the pipeline. 3. Having userspace set a ring size in bytes is a bad solution to the problem of having too small a ring. There is no way that userspace has the information to know how to properly set the ring size so it's just going to detect the feature and always set it to the maximum of 512K. This is what the compute-runtime PR does. The scheduler in i915, on the other hand, does have the information to make an informed choice. It could detect if the ring size is a problem and grow it itself. Or, if that's too hard, we could just increase the default size from 16K to 32K or even 64K instead of relying on userspace to do it. Let's drop this API for now and, if someone decides they really care about solving this problem, they can do it properly. Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/Makefile | 1 - drivers/gpu/drm/i915/gem/i915_gem_context.c | 85 +-- drivers/gpu/drm/i915/gt/intel_context_param.c | 63 -- drivers/gpu/drm/i915/gt/intel_context_param.h | 3 - include/uapi/drm/i915_drm.h | 20 + 5 files changed, 4 insertions(+), 168 deletions(-) delete mode 100644 drivers/gpu/drm/i915/gt/intel_context_param.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index d0d936d9137bc..afa22338fa343 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -88,7 +88,6 @@ gt-y += \ gt/gen8_ppgtt.o \ gt/intel_breadcrumbs.o \ gt/intel_context.o \ - gt/intel_context_param.o \ gt/intel_context_sseu.o \ gt/intel_engine_cs.o \ gt/intel_engine_heartbeat.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 188dee13e017d..650364a0dae28 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1334,63 +1334,6 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, return err; } -static int __apply_ringsize(struct intel_context *ce, void *sz) -{ - return intel_context_set_ring_size(ce, (unsigned long)sz); -} - -static int set_ringsize(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) - return -ENODEV; - - if (args->size) - return -EINVAL; - - if (!IS_ALIGNED(args->value, I915_GTT_PAGE_SIZE)) - return -EINVAL; - - if (args->value < I915_GTT_PAGE_SIZE) - return -EINVAL; - - if (args->value > 128 * I915_GTT_PAGE_SIZE) - return -EINVAL; - - return context_apply_all(ctx, -__apply_ringsize, -__intel_context_ring_size(args->value)); -} - -static int __get_ringsize(struct intel_context *ce, void *arg) -{ - long sz; - - sz = intel_context_get_ring_size(ce); - GEM_BUG_ON(sz > INT_MAX); - - return sz; /* stop on first engine */ -} - -static int get_ringsize(struct i915_gem_context *ctx, - struct drm_i915_gem_context_param *args) -{ - int sz; - - if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) - return -ENODEV; - - if (args->size) - return -EINVAL; - - sz = context_apply_all(ctx, __get_ringsize, NULL); - if (sz < 0) - return sz; - - args->value = sz; - return 0; -} - int i915_gem_user_to_context_sseu(struct intel_gt *gt, const struct drm_i915_gem_context_param_sseu *user, @@ -2036,11 +1979,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_persistence(ctx, args); break; - case I915_CONTEXT_PARAM_RINGSIZE: - ret = set_ringsize(ctx, args); - break; - case I915_CONTEXT_PARAM_BAN_PERIOD: + case I915_CONTEXT_PARAM_RINGSIZE: default: ret = -EINVAL; break; @@ -2068,18 +2008,6 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data) return ctx_setparam(arg->fpr
[PATCH 03/29] drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP
The idea behind this param is to support OpenCL drivers with relocations because OpenCL reserves 0x0 for NULL and, if we placed memory there, it would confuse CL kernels. It was originally sent out as part of a patch series including libdrm [1] and Beignet [2] support. However, the libdrm and Beignet patches never landed in their respective upstream projects so this API has never been used. It's never been used in Mesa or any other driver, either. Dropping this API allows us to delete a small bit of code. [1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html [2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 16 ++-- .../gpu/drm/i915/gem/i915_gem_context_types.h| 1 - drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 include/uapi/drm/i915_drm.h | 4 4 files changed, 6 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index ec999b7ca50f4..868c18c08a0b1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1920,15 +1920,6 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, int ret = 0; switch (args->param) { - case I915_CONTEXT_PARAM_NO_ZEROMAP: - if (args->size) - ret = -EINVAL; - else if (args->value) - set_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - else - clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - break; - case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE: if (args->size) ret = -EINVAL; @@ -1978,6 +1969,7 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_persistence(ctx, args); break; + case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: case I915_CONTEXT_PARAM_RINGSIZE: default: @@ -2358,11 +2350,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, return -ENOENT; switch (args->param) { - case I915_CONTEXT_PARAM_NO_ZEROMAP: - args->size = 0; - args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags); - break; - case I915_CONTEXT_PARAM_GTT_SIZE: args->size = 0; rcu_read_lock(); @@ -2410,6 +2397,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, args->value = i915_gem_context_is_persistent(ctx); break; + case I915_CONTEXT_PARAM_NO_ZEROMAP: case I915_CONTEXT_PARAM_BAN_PERIOD: case I915_CONTEXT_PARAM_RINGSIZE: default: diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 340473aa70de0..5ae71ec936f7c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -129,7 +129,6 @@ struct i915_gem_context { * @user_flags: small set of booleans controlled by the user */ unsigned long user_flags; -#define UCONTEXT_NO_ZEROMAP0 #define UCONTEXT_NO_ERROR_CAPTURE 1 #define UCONTEXT_BANNABLE 2 #define UCONTEXT_RECOVERABLE 3 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 297143511f99b..b812f313422a9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -290,7 +290,6 @@ struct i915_execbuffer { struct intel_context *reloc_context; u64 invalid_flags; /** Set of execobj.flags that are invalid */ - u32 context_flags; /** Set of execobj.flags to insert from the ctx */ u64 batch_len; /** Length of batch within object */ u32 batch_start_offset; /** Location within object of batch */ @@ -541,9 +540,6 @@ eb_validate_vma(struct i915_execbuffer *eb, entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } - if (!(entry->flags & EXEC_OBJECT_PINNED)) - entry->flags |= eb->context_flags; - return 0; } @@ -750,10 +746,6 @@ static int eb_select_context(struct i915_execbuffer *eb) if (rcu_access_pointer(ctx->vm)) eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT; - eb->context_flags = 0; - if (test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags)) - eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS; - return 0; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index ad8f1a0f587f6..e527f5f7e0dea 100644 --- a/include/uapi/drm/i915
[PATCH 02/29] drm/i915: Stop storing the ring size in the ring pointer (v2)
Previously, we were storing the ring size in the ring pointer before it was actually allocated. We would then guard setting the ring size on checking for CONTEXT_ALLOC_BIT. This is error-prone at best and really only saves us a few bytes on something that already burns at least 4K. Instead, this patch adds a new ring_size field and makes everything use that. v2 (Daniel Vetter): - Replace 512 * SZ_4K with SZ_2M Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +-- drivers/gpu/drm/i915/gt/intel_context.c | 3 ++- drivers/gpu/drm/i915/gt/intel_context.h | 5 - drivers/gpu/drm/i915/gt/intel_context_types.h | 1 + drivers/gpu/drm/i915/gt/intel_lrc.c | 2 +- drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- drivers/gpu/drm/i915/gt/selftest_mocs.c | 2 +- drivers/gpu/drm/i915/gt/selftest_timeline.c | 2 +- drivers/gpu/drm/i915/gvt/scheduler.c | 7 ++- 9 files changed, 10 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 650364a0dae28..ec999b7ca50f4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -211,8 +211,7 @@ static void intel_context_set_gem(struct intel_context *ce, GEM_BUG_ON(rcu_access_pointer(ce->gem_context)); RCU_INIT_POINTER(ce->gem_context, ctx); - if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) - ce->ring = __intel_context_ring_size(SZ_16K); + ce->ring_size = SZ_16K; if (rcu_access_pointer(ctx->vm)) { struct i915_address_space *vm; diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 4033184f13b9f..bd63813c8a802 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -371,7 +371,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) ce->engine = engine; ce->ops = engine->cops; ce->sseu = engine->sseu; - ce->ring = __intel_context_ring_size(SZ_4K); + ce->ring = NULL; + ce->ring_size = SZ_4K; ewma_runtime_init(&ce->runtime.avg); diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index f83a73a2b39fc..b10cbe8fee992 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -175,11 +175,6 @@ int intel_context_prepare_remote_request(struct intel_context *ce, struct i915_request *intel_context_create_request(struct intel_context *ce); -static inline struct intel_ring *__intel_context_ring_size(u64 sz) -{ - return u64_to_ptr(struct intel_ring, sz); -} - static inline bool intel_context_is_barrier(const struct intel_context *ce) { return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346b..90026c1771055 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -82,6 +82,7 @@ struct intel_context { spinlock_t signal_lock; /* protects signals, the list of requests */ struct i915_vma *state; + u32 ring_size; struct intel_ring *ring; struct intel_timeline *timeline; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index aafe2a4df4960..890b43b296a90 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -845,7 +845,7 @@ int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine) if (IS_ERR(vma)) return PTR_ERR(vma); - ring = intel_engine_create_ring(engine, (unsigned long)ce->ring); + ring = intel_engine_create_ring(engine, ce->ring_size); if (IS_ERR(ring)) { err = PTR_ERR(ring); goto err_vma; diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 1081cd36a2bd3..01d9896dd4844 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -2793,7 +2793,7 @@ static int __live_preempt_ring(struct intel_engine_cs *engine, goto err_ce; } - tmp->ring = __intel_context_ring_size(ring_sz); + tmp->ring_size = ring_sz; err = intel_context_pin(tmp); if (err) { diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c index e55a887d11e2b..f343fa5fd986f 100644 --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c @@ -28,7 +28,7 @@ static struct intel_context *mocs_context_create(struct intel_engine_cs *engine) return ce; /