Re: [PATCH] drm/i915/gt: Report full vm address range
Hi Andi, In Mesa we've been relying on I915_CONTEXT_PARAM_GTT_SIZE so as long as that is adjusted by the kernel, we should be able to continue working without issues. Acked-by: Lionel Landwerlin Thanks, -Lionel On 13/03/2024 21:39, Andi Shyti wrote: Commit 9bb66c179f50 ("drm/i915: Reserve some kernel space per vm") has reserved an object for kernel space usage. Userspace, though, needs to know the full address range. Fixes: 9bb66c179f50 ("drm/i915: Reserve some kernel space per vm") Signed-off-by: Andi Shyti Cc: Andrzej Hajda Cc: Chris Wilson Cc: Lionel Landwerlin Cc: Michal Mrozek Cc: Nirmoy Das Cc: # v6.2+ --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index fa46d2308b0e..d76831f50106 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -982,8 +982,9 @@ static int gen8_init_rsvd(struct i915_address_space *vm) vm->rsvd.vma = i915_vma_make_unshrinkable(vma); vm->rsvd.obj = obj; - vm->total -= vma->node.size; + return 0; + unref: i915_gem_object_put(obj); return ret;
Re: [Intel-gfx] [PATCH 1/2] drm/i915/perf: Subtract gtt_offset from hw_tail
On 18/07/2023 05:43, Ashutosh Dixit wrote: The code in oa_buffer_check_unlocked() is correct only if the OA buffer is 16 MB aligned (which seems to be the case today in i915). However when the 16 MB alignment is dropped, when we "Subtract partial amount off the tail", the "& (OA_BUFFER_SIZE - 1)" operation in OA_TAKEN() will result in an incorrect hw_tail value. Therefore hw_tail must be brought to the same base as head and read_tail prior to OA_TAKEN by subtracting gtt_offset from hw_tail. Signed-off-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_perf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 49c6f1ff11284..f7888a44d1284 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -565,6 +565,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) partial_report_size %= report_size; /* Subtract partial amount off the tail */ + hw_tail -= gtt_offset; hw_tail = OA_TAKEN(hw_tail, partial_report_size); /* NB: The head we observe here might effectively be a little You should squash this patch with the next one. Otherwise further down this function there is another hw_tail -= gtt_offset; -Lionel
Re: [Intel-gfx] [PATCH] drm/i915: Introduce Wa_14011274333
On 13/07/2023 02:34, Matt Atwood wrote: Wa_14011274333 applies to RKL, ADL-S, ADL-P and TGL. ALlocate buffer pinned to GGTT and add WA to restore impacted registers. v2: use correct lineage number, more generically apply workarounds for all registers impacted, move workaround to gt/intel_workarounds.c (MattR) Based off patch by Tilak Tangudu. Signed-off-by: Matt Atwood I applied this patch to drm-tip and as far as I can tell it doesn't fix the problem of the SAMPLER_MODE register loosing its bit0 programming. -Lionel --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 5 ++ drivers/gpu/drm/i915/gt/intel_rc6.c | 63 + drivers/gpu/drm/i915/gt/intel_rc6_types.h | 3 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 40 + 4 files changed, 111 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 718cb2c80f79e..eaee35ecbc8d3 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -83,6 +83,11 @@ #define MTL_MCR_GROUPID REG_GENMASK(11, 8) #define MTL_MCR_INSTANCEID REG_GENMASK(3, 0) +#define CTX_WA_PTR_MMIO(0x2058) +#define CTX_WA_PTR_ADDR_MASK REG_GENMASK(31,12) +#define CTX_WA_TYPE_MASK REG_GENMASK(4,3) +#define CTX_WA_VALID REG_BIT(0) + #define IPEIR_I965_MMIO(0x2064) #define IPEHR_I965_MMIO(0x2068) diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c index 58bb1c55294c9..6baa341814da7 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6.c +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c @@ -14,6 +14,7 @@ #include "intel_gt.h" #include "intel_gt_pm.h" #include "intel_gt_regs.h" +#include "intel_gpu_commands.h" #include "intel_pcode.h" #include "intel_rc6.h" @@ -53,6 +54,65 @@ static struct drm_i915_private *rc6_to_i915(struct intel_rc6 *rc) return rc6_to_gt(rc)->i915; } +static int rc6_wa_bb_ctx_init(struct intel_rc6 *rc6) +{ + struct drm_i915_private *i915 = rc6_to_i915(rc6); + struct intel_gt *gt = rc6_to_gt(rc6); + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + void *batch; + struct i915_gem_ww_ctx ww; + int err; + + obj = i915_gem_object_create_shmem(i915, PAGE_SIZE); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); + if (IS_ERR(vma)){ + err = PTR_ERR(vma); + goto err; + } + rc6->vma=vma; + i915_gem_ww_ctx_init(&ww, true); +retry: + err = i915_gem_object_lock(obj, &ww); + if (!err) + err = i915_ggtt_pin(rc6->vma, &ww, 0, PIN_HIGH); + if (err) + goto err_ww_fini; + + batch = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(batch)) { + err = PTR_ERR(batch); + goto err_unpin; + } + rc6->rc6_wa_bb = batch; + return 0; +err_unpin: + if (err) + i915_vma_unpin(rc6->vma); +err_ww_fini: + if (err == -EDEADLK) { + err = i915_gem_ww_ctx_backoff(&ww); + if (!err) + goto retry; + } + i915_gem_ww_ctx_fini(&ww); + + if (err) + i915_vma_put(rc6->vma); +err: + i915_gem_object_put(obj); + return err; +} + +void rc6_wa_bb_ctx_wa_fini(struct intel_rc6 *rc6) +{ + i915_vma_unpin(rc6->vma); + i915_vma_put(rc6->vma); +} + static void gen11_rc6_enable(struct intel_rc6 *rc6) { struct intel_gt *gt = rc6_to_gt(rc6); @@ -616,6 +676,9 @@ void intel_rc6_init(struct intel_rc6 *rc6) err = chv_rc6_init(rc6); else if (IS_VALLEYVIEW(i915)) err = vlv_rc6_init(rc6); + else if ((GRAPHICS_VER_FULL(i915) >= IP_VER(12, 0)) && +(GRAPHICS_VER_FULL(i915) <= IP_VER(12, 70))) + err = rc6_wa_bb_ctx_init(rc6); else err = 0; diff --git a/drivers/gpu/drm/i915/gt/intel_rc6_types.h b/drivers/gpu/drm/i915/gt/intel_rc6_types.h index cd4587098162a..643fd4e839ad4 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6_types.h +++ b/drivers/gpu/drm/i915/gt/intel_rc6_types.h @@ -33,6 +33,9 @@ struct intel_rc6 { struct drm_i915_gem_object *pctx; + u32 *rc6_wa_bb; + struct i915_vma *vma; + bool supported : 1; bool enabled : 1; bool manual : 1; diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 4d2dece960115..d20afb318d857 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -14,6 +14,7 @@ #include "intel_gt_regs.h" #include "intel_ring.h" #include "intel_workarounds.h" +#include "intel_r
Re: [Intel-gfx] [PATCH] drm/i915/perf: Clear out entire reports after reading if not power of 2 size
On 22/05/2023 23:17, Ashutosh Dixit wrote: Clearing out report id and timestamp as means to detect unlanded reports only works if report size is power of 2. That is, only when report size is a sub-multiple of the OA buffer size can we be certain that reports will land at the same place each time in the OA buffer (after rewind). If report size is not a power of 2, we need to zero out the entire report to be able to detect unlanded reports reliably. Cc: Umesh Nerlige Ramappa Signed-off-by: Ashutosh Dixit Sad but necessary unfortunately Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 19d5652300eeb..58284156428dc 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -877,12 +877,17 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, stream->oa_buffer.last_ctx_id = ctx_id; } - /* -* Clear out the report id and timestamp as a means to detect unlanded -* reports. -*/ - oa_report_id_clear(stream, report32); - oa_timestamp_clear(stream, report32); + if (is_power_of_2(report_size)) { + /* +* Clear out the report id and timestamp as a means +* to detect unlanded reports. +*/ + oa_report_id_clear(stream, report32); + oa_timestamp_clear(stream, report32); + } else { + /* Zero out the entire report */ + memset(report32, 0, report_size); + } } if (start_offset != *offset) {
Re: [Intel-gfx] [PATCH] i915/perf: Avoid reading OA reports before they land
Hi Umesh, Looks like there is still a problem with the if block moving the stream->oa_buffer.tail forward. An application not doing any polling would still run into the same problem. If I understand correctly this change, it means the time based workaround doesn't work. We need to actually check the report's content before moving the software tracked tail. If that's the case, they maybe we should just delete that code. -Lionel On 20/05/2023 01:56, Umesh Nerlige Ramappa wrote: On DG2, capturing OA reports while running heavy render workloads sometimes results in invalid OA reports where 64-byte chunks inside reports have stale values. Under memory pressure, high OA sampling rates (13.3 us) and heavy render workload, occassionally, the OA HW TAIL pointer does not progress as fast as the sampling rate. When these glitches occur, the TAIL pointer takes approx. 200us to progress. While this is expected behavior from the HW perspective, invalid reports are not expected. In oa_buffer_check_unlocked(), when we execute the if condition, we are updating the oa_buffer.tail to the aging tail and then setting pollin based on this tail value, however, we do not have a chance to rewind and validate the reports prior to setting pollin. The validation happens in a subsequent call to oa_buffer_check_unlocked(). If a read occurs before this validation, then we end up reading reports up until this oa_buffer.tail value which includes invalid reports. Though found on DG2, this affects all platforms. Set the pollin only in the else condition in oa_buffer_check_unlocked. Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7484 Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7757 Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_perf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 19d5652300ee..61536e3c4ac9 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -545,7 +545,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); int report_size = stream->oa_buffer.format->size; unsigned long flags; - bool pollin; + bool pollin = false; u32 hw_tail; u64 now; u32 partial_report_size; @@ -620,10 +620,10 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) stream->oa_buffer.tail = gtt_offset + tail; stream->oa_buffer.aging_tail = gtt_offset + hw_tail; stream->oa_buffer.aging_timestamp = now; - } - pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset, - stream->oa_buffer.head - gtt_offset) >= report_size; + pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset, + stream->oa_buffer.head - gtt_offset) >= report_size; + } spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
Re: [Intel-gfx] [PATCH v8 6/8] drm/i915/uapi/pxp: Add a GET_PARAM for PXP
On 27/04/2023 21:19, Teres Alexis, Alan Previn wrote: (fixed email addresses again - why is my Evolution client deteorating??) On Thu, 2023-04-27 at 17:18 +, Teres Alexis, Alan Previn wrote: On Wed, 2023-04-26 at 15:35 -0700, Justen, Jordan L wrote: On 2023-04-26 11:17:16, Teres Alexis, Alan Previn wrote: alan:snip Can you tell that pxp is in progress, but not ready yet, as a separate state from 'it will never work on this platform'? If so, maybe the status could return something like: 0: It's never going to work 1: It's ready to use 2: It's starting and should work soon I could see an argument for treating that as a case where we could still advertise protected content support, but if we try to use it we might be in for a nasty delay. alan: IIRC Lionel seemed okay with any permutation that would allow it to not get blocked. Daniele did ask for something similiar to what u mentioned above but he said that is non-blocking. But since both you AND Daniele have mentioned the same thing, i shall re-rev this and send that change out today. I notice most GET_PARAMS use -ENODEV for "never gonna work" so I will stick with that. but 1 = ready to use and 2 = starting and should work sounds good. so '0' will never be returned - we just look for a positive value (from user space). I will also make a PR for mesa side as soon as i get it tested. thanks for reviewing btw. alan: I also realize with these final touch-ups, we can go back to the original pxp-context-creation timeout of 250 milisecs like it was on ADL since the user space component will have this new param to check on (so even farther down from 1 sec on the last couple of revs). Jordan, Lional - i am thinking of creating the PR on MESA side to take advantage of GET_PARAM on both get-caps AND runtime creation (latter will be useful to ensure no unnecesssary delay experienced by Mesa stuck in kernel call - which practically never happenned in ADL AFAIK): 1. MESA PXP get caps: - use GET_PARAM (any positive number shall mean its supported). 2. MESA app-triggered PXP context creation (i.e. if caps was supported): - use GET_PARAM to wait until positive number switches from "2" to "1". - now call context creation. So at this point if it fails, we know its an actual failure. you guys okay with above? (i'll re-rev this kernel series first and wait on your ack or feedback before i create/ test/ submit a PR for Mesa side). Sounds good. Thanks, -Lionel
Re: [Intel-gfx] [PATCH v7 6/8] drm/i915/uapi/pxp: Fix UAPI spec comments and add GET_PARAM for PXP
On 14/04/2023 18:17, Teres Alexis, Alan Previn wrote: Hi Lionel, does this patch work for you? Hi, Sorry for the late answer. That looks good : Acked-by: Lionel Landwerlin Thanks, -Lionel On Mon, 2023-04-10 at 10:22 -0700, Ceraolo Spurio, Daniele wrote: On 4/6/2023 10:44 AM, Alan Previn wrote: alan:snip +/* + * Query the status of PXP support in i915. + * + * The query can fail in the following scenarios with the listed error codes: + * -ENODEV = PXP support is not available on the GPU device or in the kernel + *due to missing component drivers or kernel configs. + * If the IOCTL is successful, the returned parameter will be set to one of the + * following values: + * 0 = PXP support maybe available but underlying SOC fusing, BIOS or firmware + * configuration is unknown and a PXP-context-creation would be required + * for final verification of feature availibility. Would it be useful to add: 1 = PXP support is available And start returning that after we've successfully created our first session? Not sure if userspace would use this though, since they still need to handle the 0 case anyway. I'm also ok with this patch as-is, as long as you get an ack from the userspace drivers for this interface behavior: Reviewed-by: Daniele Ceraolo Spurio Daniele alan:snip
Re: [Intel-gfx] [PATCH] drm/i915: disable sampler indirect state in bindless heap
On 07/04/2023 12:32, Lionel Landwerlin wrote: By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. BSpec: 46052 Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org Reviewed-by: Haridhar Kalvala Screwed up the subject-prefix, but this is v4. Rebased due to another change touching the same register. -Lionel --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 492b3de6678d7..fd1f9cd35e9d7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1145,6 +1145,7 @@ #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) #define MTL_DISABLE_SAMPLER_SC_OOO REG_BIT(3) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 6ea453ddd0116..b925ef47304b6 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2971,6 +2971,25 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li add_render_compute_tuning_settings(i915, wal); + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +* +* BSpec: 46052 +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER)) /* Wa_14017856879 */
[Intel-gfx] [PATCH] drm/i915: disable sampler indirect state in bindless heap
By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. BSpec: 46052 Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org Reviewed-by: Haridhar Kalvala --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 492b3de6678d7..fd1f9cd35e9d7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1145,6 +1145,7 @@ #define SC_DISABLE_POWER_OPTIMIZATION_EBBREG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSGREG_BIT(5) #define MTL_DISABLE_SAMPLER_SC_OOO REG_BIT(3) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 6ea453ddd0116..b925ef47304b6 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2971,6 +2971,25 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li add_render_compute_tuning_settings(i915, wal); + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +* +* BSpec: 46052 +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER)) /* Wa_14017856879 */ -- 2.34.1
Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On 04/04/2023 19:04, Yang, Fei wrote: Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation On 01/04/2023 09:38, fei.y...@intel.com wrote: From: Fei Yang To comply with the design that buffer objects shall have immutable cache setting through out its life cycle, {set, get}_caching ioctl's are no longer supported from MTL onward. With that change caching policy can only be set at object creation time. The current code applies a default (platform dependent) cache setting for all objects. However this is not optimal for performance tuning. The patch extends the existing gem_create uAPI to let user set PAT index for the object at creation time. The new extension is platform independent, so UMD's can switch to using this extension for older platforms as well, while {set, get}_caching are still supported on these legacy paltforms for compatibility reason. Cc: Chris Wilson Cc: Matt Roper Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Just like the protected content uAPI, there is no way for userspace to tell this feature is available other than trying using it. Given the issues with protected content, is it not thing we could want to add? Sorry I'm not aware of the issues with protected content, could you elaborate? There was a long discussion on teams uAPI channel, could you comment there if any concerns? https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b928d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed7712e255d&groupId=379f3ae1-d138-4205-bb65-d4c7d38cb481&parentMessageId=1675860924675&teamName=GSE%20OSGC&channelName=i915%20uAPI%20changes&createdTime=1675860924675&allowXTenantAccess=false Thanks, -Fei We wanted to have a getparam to detect protected support and were told to detect it by trying to create a context with it. Now it appears trying to create a protected context can block for several seconds. Since we have to report capabilities to the user even before it creates protected contexts, any app is at risk of blocking. -Lionel Thanks, -Lionel --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 include/uapi/drm/i915_drm.h| 36 ++ tools/include/uapi/drm/i915_drm.h | 36 ++ 3 files changed, 105 insertions(+)
Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On 01/04/2023 09:38, fei.y...@intel.com wrote: From: Fei Yang To comply with the design that buffer objects shall have immutable cache setting through out its life cycle, {set, get}_caching ioctl's are no longer supported from MTL onward. With that change caching policy can only be set at object creation time. The current code applies a default (platform dependent) cache setting for all objects. However this is not optimal for performance tuning. The patch extends the existing gem_create uAPI to let user set PAT index for the object at creation time. The new extension is platform independent, so UMD's can switch to using this extension for older platforms as well, while {set, get}_caching are still supported on these legacy paltforms for compatibility reason. Cc: Chris Wilson Cc: Matt Roper Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Just like the protected content uAPI, there is no way for userspace to tell this feature is available other than trying using it. Given the issues with protected content, is it not thing we could want to add? Thanks, -Lionel --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 include/uapi/drm/i915_drm.h| 36 ++ tools/include/uapi/drm/i915_drm.h | 36 ++ 3 files changed, 105 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index e76c9703680e..1c6e2034d28e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -244,6 +244,7 @@ struct create_ext { unsigned int n_placements; unsigned int placement_mask; unsigned long flags; + unsigned int pat_index; }; static void repr_placements(char *buf, size_t size, @@ -393,11 +394,39 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data return 0; } +static int ext_set_pat(struct i915_user_extension __user *base, void *data) +{ + struct create_ext *ext_data = data; + struct drm_i915_private *i915 = ext_data->i915; + struct drm_i915_gem_create_ext_set_pat ext; + unsigned int max_pat_index; + + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) != +offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd)); + + if (copy_from_user(&ext, base, sizeof(ext))) + return -EFAULT; + + max_pat_index = INTEL_INFO(i915)->max_pat_index; + + if (ext.pat_index > max_pat_index) { + drm_dbg(&i915->drm, "PAT index is invalid: %u\n", + ext.pat_index); + return -EINVAL; + } + + ext_data->pat_index = ext.pat_index; + + return 0; +} + static const i915_user_extension_fn create_extensions[] = { [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements, [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected, + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat, }; +#define PAT_INDEX_NOT_SET 0x /** * Creates a new mm object and returns a handle to it. * @dev: drm device pointer @@ -417,6 +446,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL; + ext_data.pat_index = PAT_INDEX_NOT_SET; ret = i915_user_extensions(u64_to_user_ptr(args->extensions), create_extensions, ARRAY_SIZE(create_extensions), @@ -453,5 +483,8 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (IS_ERR(obj)) return PTR_ERR(obj); + if (ext_data.pat_index != PAT_INDEX_NOT_SET) + i915_gem_object_set_pat_index(obj, ext_data.pat_index); + return i915_gem_publish(obj, file, &args->size, &args->handle); } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index dba7c5a5b25e..03c5c314846e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext { * * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see * struct drm_i915_gem_create_ext_protected_content. +* +* For I915_GEM_CREATE_EXT_SET_PAT usage see +* struct drm_i915_gem_create_ext_set_pat. */ #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 +#define I915_GEM_CREATE_EXT_SET_PAT 2 __u64 extensions; }; @@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content { __u32 flags; }; +/** + * struct drm_i915_gem_create_ext_set_pat - The + * I915_GEM_CREATE_EXT_SET_PAT extension. + * + * If this extension is provided, the specified caching policy (PAT index) is + * applied to the buffer object. + * + * Below is an example on how to create an object with
Re: [Intel-gfx] [PATCH] drm/i915: disable sampler indirect state in bindless heap
On 03/04/2023 21:22, Kalvala, Haridhar wrote: On 3/31/2023 12:35 PM, Kalvala, Haridhar wrote: On 3/30/2023 10:49 PM, Lionel Landwerlin wrote: On 29/03/2023 01:49, Matt Atwood wrote: On Tue, Mar 28, 2023 at 04:14:33PM +0530, Kalvala, Haridhar wrote: On 3/9/2023 8:56 PM, Lionel Landwerlin wrote: By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. Bspec:46052 Sorry, missed your answer. Should I just add the Bspec number to the commit message ? Thanks, -Lionel Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 17 + 2 files changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 08d76aa06974c..1aaa471d08c56 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1141,6 +1141,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 32aa1647721ae..734b64e714647 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2542,6 +2542,23 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) ENABLE_SMALLPL); } + if (GRAPHICS_VER(i915) >= 11) { Hi Lionel, Not sure should this implementation be part of "rcs_engine_wa_init" or "general_render_compute_wa_init" ? I checked with Matt Ropper as well, looks like this implementation should be part of "general_render_compute_wa_init". I did send a v3 of the patch last Thursday to address this. Let me know if that's good. Thanks, -Lionel + /* This is not a Wa (although referred to as + * WaSetInidrectStateOverride in places), this allows + * applications that reference sampler states through + * the BindlessSamplerStateBaseAddress to have their + * border color relative to DynamicStateBaseAddress + * rather than BindlessSamplerStateBaseAddress. + * + * Otherwise SAMPLER_STATE border colors have to be + * copied in multiple heaps (DynamicStateBaseAddress & + * BindlessSamplerStateBaseAddress) + */ + wa_mcr_masked_en(wal, + GEN10_SAMPLER_MODE, since we checking the condition for GEN11 or above, can this register be defined as GEN11_SAMPLER_MODE We use the name of the first time the register was introduced, gen 10 is fine here. ok + GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, -- Regards, Haridhar Kalvala Regards, MattA
Re: [Intel-gfx] [v2] drm/i915: disable sampler indirect state in bindless heap
On 30/03/2023 22:38, Matt Atwood wrote: On Thu, Mar 30, 2023 at 12:27:33PM -0700, Matt Atwood wrote: On Thu, Mar 30, 2023 at 08:47:40PM +0300, Lionel Landwerlin wrote: By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they convenient only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. BSpec: 46052 Assuming still good CI results.. Reviewed-by: Matt Atwood My mistake version 3 required. comments inline. Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 4aecb5a7b6318..f298dc461a72f 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1144,6 +1144,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index e7ee24bcad893..0ce1c8c23c631 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2535,6 +2535,25 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) ENABLE_SMALLPL); } This workaround belongs in general render workarounds not rcs, as per the address space in i915_regs.h 0x2xxx. #define RENDER_RING_BASE0x02000 Thanks makes sense. -Lionel + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +* +* BSpec: 46052 +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, -- 2.34.1 MattA
[Intel-gfx] [v3] drm/i915: disable sampler indirect state in bindless heap
By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. BSpec: 46052 Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 4aecb5a7b6318..f298dc461a72f 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1144,6 +1144,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBBREG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSGREG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index e7ee24bcad893..5bfc864d5fcc0 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2971,6 +2971,25 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li add_render_compute_tuning_settings(i915, wal); + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +* +* BSpec: 46052 +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) || IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0) || IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || -- 2.34.1
[Intel-gfx] [v2] drm/i915: disable sampler indirect state in bindless heap
By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. BSpec: 46052 Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 4aecb5a7b6318..f298dc461a72f 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1144,6 +1144,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBBREG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSGREG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index e7ee24bcad893..0ce1c8c23c631 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2535,6 +2535,25 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) ENABLE_SMALLPL); } + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +* +* BSpec: 46052 +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, -- 2.34.1
Re: [Intel-gfx] [PATCH] drm/i915: disable sampler indirect state in bindless heap
On 29/03/2023 01:49, Matt Atwood wrote: On Tue, Mar 28, 2023 at 04:14:33PM +0530, Kalvala, Haridhar wrote: On 3/9/2023 8:56 PM, Lionel Landwerlin wrote: By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. Bspec:46052 Sorry, missed your answer. Should I just add the Bspec number to the commit message ? Thanks, -Lionel Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 17 + 2 files changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 08d76aa06974c..1aaa471d08c56 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1141,6 +1141,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLAREG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 32aa1647721ae..734b64e714647 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2542,6 +2542,23 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) ENABLE_SMALLPL); } + if (GRAPHICS_VER(i915) >= 11) { Hi Lionel, Not sure should this implementation be part of "rcs_engine_wa_init" or "general_render_compute_wa_init". + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, since we checking the condition for GEN11 or above, can this register be defined as GEN11_SAMPLER_MODE We use the name of the first time the register was introduced, gen 10 is fine here. +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, -- Regards, Haridhar Kalvala Regards, MattA
Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup
On 26/03/2023 14:18, Rodrigo Vivi wrote: On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn wrote: alan:snip @@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp) alan:snip + if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) { + /* +* GSC-fw loading, GSC-proxy init (requiring an mei component driver) and +* HuC-fw loading must all occur first before we start requesting for PXP +* sessions. Checking HuC authentication (the last dependency) will suffice. +* Let's use a much larger 8 second timeout considering all the types of +* dependencies prior to that. +*/ + if (wait_for(intel_huc_is_authenticated(&pxp->ctrl_gt->uc.huc), 8000)) This big timeout needs an ack from userspace drivers, as intel_pxp_start is called during context creation and the current way to query if the feature is supported is to create a protected context. Unfortunately, we do need to wait to confirm that PXP is available (although in most cases it shouldn't take even close to 8 secs), because until everything is setup we're not sure if things will work as expected. I see 2 potential mitigations in case the timeout doesn't work as-is: 1) we return -EAGAIN (or another dedicated error code) to userspace if the prerequisite steps aren't done yet. This would indicate that the feature is there, but that we haven't completed the setup yet. The caller can then decide if they want to retry immediately or later. Pro: more flexibility for userspace; Cons: new interface return code. 2) we add a getparam to say if PXP is supported in HW and the support is compiled in i915. Userspace can query this as a way to check the feature support and only create the context if they actually need it for PXP operations. Pro: simpler kernel implementation; Cons: new getparam, plus even if the getparam returns true the pxp_start could later fail, so userspace needs to handle that case. alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on above issue? Recap: On MTL, only when creating a GEM Protected (PXP) context for the very first time after a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC loading the HuC firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on additional GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd used the 8 second number based on offline conversations with Daniele but that is a worse-case. Alternatively, should we change UAPI instead to return -EAGAIN as per Daniele's proposal? I believe we've had the get-param conversation offline recently and the direction was to stick with attempting to create the context as it is normal in 3D UMD when it comes to testing capabilities for other features too. Thoughts? I like the option 1 more. This extra return handling won't break compatibility. I like option 2 better because we have to report support as fast as we can when enumerating devices on the system for example. If I understand correctly, with the get param, most apps won't ever be blocking on any PXP stuff if they don't use it. Only the ones that require protected support might block. -Lionel
[Intel-gfx] [PATCH] drm/i915: disable sampler indirect state in bindless heap
By default the indirect state sampler data (border colors) are stored in the same heap as the SAMPLER_STATE structure. For userspace drivers that can be 2 different heaps (dynamic state heap & bindless sampler state heap). This means that border colors have to copied in 2 different places so that the same SAMPLER_STATE structure find the right data. This change is forcing the indirect state sampler data to only be in the dynamic state pool (more convinient for userspace drivers, they only have to have one copy of the border colors). This is reproducing the behavior of the Windows drivers. Signed-off-by: Lionel Landwerlin Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 17 + 2 files changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 08d76aa06974c..1aaa471d08c56 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1141,6 +1141,7 @@ #define ENABLE_SMALLPL REG_BIT(15) #define SC_DISABLE_POWER_OPTIMIZATION_EBBREG_BIT(9) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSGREG_BIT(5) +#define GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE REG_BIT(0) #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 32aa1647721ae..734b64e714647 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2542,6 +2542,23 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) ENABLE_SMALLPL); } + if (GRAPHICS_VER(i915) >= 11) { + /* This is not a Wa (although referred to as +* WaSetInidrectStateOverride in places), this allows +* applications that reference sampler states through +* the BindlessSamplerStateBaseAddress to have their +* border color relative to DynamicStateBaseAddress +* rather than BindlessSamplerStateBaseAddress. +* +* Otherwise SAMPLER_STATE border colors have to be +* copied in multiple heaps (DynamicStateBaseAddress & +* BindlessSamplerStateBaseAddress) +*/ + wa_mcr_masked_en(wal, +GEN10_SAMPLER_MODE, +GEN11_INDIRECT_STATE_BASE_ADDR_OVERRIDE); + } + if (GRAPHICS_VER(i915) == 11) { /* This is not an Wa. Enable for better image quality */ wa_masked_en(wal, -- 2.34.1
Re: [Intel-gfx] [PATCH 0/6] drm/i915: Fix up and test RING_TIMESTAMP on gen4-6
On 31/10/2022 15:56, Ville Syrjala wrote: From: Ville Syrjälä Correct the ring timestamp frequency for gen4/5, and run the relevant selftests for gen4-6. I've posted at least most of this before, but stuff changed in the meantinme so it needed a rebase. Ville Syrjälä (6): drm/i915: Fix cs timestamp frequency for ctg/elk/ilk drm/i915: Stop claiming cs timestamp frquency on gen2/3 drm/i915: Fix cs timestamp frequency for cl/bw drm/i915/selftests: Run MI_BB perf selftests on SNB drm/i915/selftests: Test RING_TIMESTAMP on gen4/5 drm/i915/selftests: Run the perf MI_BB tests on gen4/5 .../gpu/drm/i915/gt/intel_gt_clock_utils.c| 38 --- drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 22 +-- drivers/gpu/drm/i915/gt/selftest_gt_pm.c | 36 -- 3 files changed, 67 insertions(+), 29 deletions(-) Reviewed-by: Lionel Landwerlin
Re: [Intel-gfx] [PATCH v6 00/16] Add DG2 OA support
Thanks Umesh, Is it looking good to land? Looking forward to have this in Mesa upstream. -Lionel On 27/10/2022 01:20, Umesh Nerlige Ramappa wrote: Add OA format support for DG2 and various fixes for DG2. This series has 2 uapi changes listed below: 1) drm/i915/perf: Add OAG and OAR formats for DG2 DG2 has new OA formats defined that can be selected by the user. The UMD changes that are consumed by GPUvis are: https://patchwork.freedesktop.org/patch/504456/?series=107633&rev=5 Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893 2) drm/i915/perf: Apply Wa_18013179988 DG2 has a bug where the OA timestamp does not tick at the CS timestamp frequency. Instead it ticks at a multiple that is determined from the CTC_SHIFT value in RPM_CONFIG. Since the timestamp is used by UMD to make sense of all the counters in the report, expose the OA timestamp frequency to the user. The interface is generic and applies to all platforms. On platforms where the bug is not present, this returns the CS timestamp frequency. UMD specific changes consumed by GPUvis are: https://patchwork.freedesktop.org/patch/504464/?series=107633&rev=5 Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893 v2: - Add review comments - Update uapi changes in cover letter - Drop patches for non-production platforms drm/i915/perf: Use helpers to process reports w.r.t. OA buffer size drm/i915/perf: Add Wa_16010703925:dg2 - Drop 64-bit OA format changes for now drm/i915/perf: Parse 64bit report header formats correctly drm/i915/perf: Add Wa_1608133521:dg2 v3: - Add review comments to patches 02, 04, 05, 14 - Drop Acks v4: - Add review comments to patch 04 - Update R-bs - Add MR links to patches 02 and 12 v5: - Drop unrelated comment - Rebase and fix MCR reg write - On pre-gen12, EU flex config is saved/restored in the context image, so save/restore EU flex config only for gen12. v6: - Fix checkpatch issues Test-with: 20221025200709.83314-1-umesh.nerlige.rama...@intel.com Signed-off-by: Umesh Nerlige Ramappa Lionel Landwerlin (1): drm/i915/perf: complete programming whitelisting for XEHPSDV Umesh Nerlige Ramappa (14): drm/i915/perf: Fix OA filtering logic for GuC mode drm/i915/perf: Add 32-bit OAG and OAR formats for DG2 drm/i915/perf: Fix noa wait predication for DG2 drm/i915/perf: Determine gen12 oa ctx offset at runtime drm/i915/perf: Enable bytes per clock reporting in OA drm/i915/perf: Simply use stream->ctx drm/i915/perf: Move gt-specific data from i915->perf to gt->perf drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers drm/i915/perf: Store a pointer to oa_format in oa_buffer drm/i915/perf: Add Wa_1508761755:dg2 drm/i915/perf: Apply Wa_18013179988 drm/i915/perf: Save/restore EU flex counters across reset drm/i915/perf: Enable OA for DG2 Vinay Belgaumkar (1): drm/i915/guc: Support OA when Wa_16011777198 is enabled drivers/gpu/drm/i915/gt/intel_engine_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 4 + drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_gt_types.h | 3 + drivers/gpu/drm/i915/gt/intel_lrc.h | 2 + drivers/gpu/drm/i915/gt/intel_sseu.c | 4 +- .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 10 + drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 66 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h | 2 + drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_getparam.c | 3 + drivers/gpu/drm/i915/i915_pci.c | 2 + drivers/gpu/drm/i915/i915_perf.c | 576 ++ drivers/gpu/drm/i915/i915_perf.h | 2 + drivers/gpu/drm/i915/i915_perf_oa_regs.h | 6 +- drivers/gpu/drm/i915/i915_perf_types.h| 47 +- drivers/gpu/drm/i915/intel_device_info.h | 2 + drivers/gpu/drm/i915/selftests/i915_perf.c| 16 +- include/uapi/drm/i915_drm.h | 10 + 20 files changed, 630 insertions(+), 141 deletions(-)
Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
On 15/09/2022 02:13, Umesh Nerlige Ramappa wrote: On Wed, Sep 14, 2022 at 03:26:15PM -0700, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 09:39:33PM +0300, Lionel Landwerlin wrote: On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote: On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: With GuC mode of submission, GuC is in control of defining the context id field that is part of the OA reports. To filter reports, UMD and KMD must know what sw context id was chosen by GuC. There is not interface between KMD and GuC to determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA reports for the specific context. Signed-off-by: Umesh Nerlige Ramappa I assume you checked with GuC that this doesn't change as the context is running? Correct. With i915/execlist submission mode, we had to ask i915 to pin the sw_id/ctx_id. From GuC perspective, the context id can change once KMD de-registers the context and that will not happen while the context is in use. Thanks, Umesh Thanks Umesh, Maybe I should have been more precise in my question : Can the ID change while the i915-perf stream is opened? Because the ID not changing while the context is running makes sense. But since the number of available IDs is limited to 2k or something on Gfx12, it's possible the GuC has to reuse IDs if too many apps want to run during the period of time while i915-perf is active and filtering. available guc ids are 64k with 4k reserved for multi-lrc, so GuC may have to reuse ids once 60k ids are used up. Spoke to the GuC team again and if there are a lot of contexts (> 60K) running, there is a possibility of the context id being recycled. In that case, the capture would be broken. I would track this as a separate JIRA and follow up on a solution. From OA use case perspective, are we interested in monitoring just one hardware context? If we make sure this context is not stolen, are we good? Thanks, Umesh Yep, we only care about that one ID not changing. Thanks, -Lionel Thanks, Umesh -Lionel If that's not the case then filtering is broken. -Lionel --- drivers/gpu/drm/i915/gt/intel_lrc.h | 2 + drivers/gpu/drm/i915/i915_perf.c | 141 2 files changed, 124 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index a390f0813c8b..7111bae759f3 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -110,6 +110,8 @@ enum { #define XEHP_SW_CTX_ID_WIDTH 16 #define XEHP_SW_COUNTER_SHIFT 58 #define XEHP_SW_COUNTER_WIDTH 6 +#define GEN12_GUC_SW_CTX_ID_SHIFT 39 +#define GEN12_GUC_SW_CTX_ID_WIDTH 16 static inline void lrc_runtime_start(struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f3c23fe9ad9c..735244a3aedd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) return stream->pinned_ctx; } +static int +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset) +{ + u32 *cs, cmd; + + cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; + if (GRAPHICS_VER(rq->engine->i915) >= 8) + cmd++; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = cmd; + *cs++ = i915_mmio_reg_offset(reg); + *cs++ = ggtt_offset; + *cs++ = 0; + + intel_ring_advance(rq, cs); + + return 0; +} + +static int +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset) +{ + struct i915_request *rq; + int err; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + i915_request_get(rq); + + err = __store_reg_to_mem(rq, reg, ggtt_offset); + + i915_request_add(rq); + if (!err && i915_request_wait(rq, 0, HZ / 2) < 0) + err = -ETIME; + + i915_request_put(rq); + + return err; +} + +static int +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id) +{ + struct i915_vma *scratch; + u32 *val; + int err; + + scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4); + if (IS_ERR(scratch)) + return PTR_ERR(scratch); + + err = i915_vma_sync(scratch); + if (err) + goto err_scratch; + + err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base), + i915_ggtt_offset(scratch)); + if (err) + goto err_scratch; + + val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); + if (IS_ERR(val)) { + err = PTR_ERR(val); + goto err_scratch; + } + + *ctx_id =
Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
On 06/09/2022 23:35, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 10:48:50PM +0300, Lionel Landwerlin wrote: On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Some SKUs of same gen12 platform may have different oactxctrl offsets. For gen12, determine oactxctrl offsets at runtime. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_perf.c | 149 ++- drivers/gpu/drm/i915/i915_perf_oa_regs.h | 2 +- 2 files changed, 120 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3526693d64fa..efa7eda83edd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream) return 0; } +#define MI_OPCODE(x) (((x) >> 23) & 0x3f) +#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0))) +#define MI_LRI_LEN(x) (((x) & 0xff) + 1) Maybe you want to put this in intel_gpu_commands.h +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX) +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset) +{ + u32 idx = *offset; + u32 len = MI_LRI_LEN(state[idx]) + idx; + + idx++; + for (; idx < len; idx += 2) + if (state[idx] == reg) + break; + + *offset = idx; + return state[idx] == reg; +} + +static u32 __context_image_offset(struct intel_context *ce, u32 reg) +{ + u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4; + u32 *state = ce->lrc_reg_state; + + for (offset = 0; offset < len; ) { + if (IS_MI_LRI_CMD(state[offset])) { I'm a bit concerned you might find other matches with this. Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE instruction, you'll iterate the instruction dword by dword because you don't know how to read its length and skip to the next one. Now some of the fields can be programmed from userspace to look like an MI_LRI header, so you start to read data in the wrong way. Unfortunately I don't have a better solution. My only ask is that you make __find_reg_in_lri() take the context image size in parameter so it NEVER goes over the the context image. To limit the risk you should run this function only one at driver initialization and store the found offset. Hmm, didn't know that there may be non-LRI commands in the context image or user could add to the context image somehow. Does using the context image size alone address these issues? Even after including the size in the logic, any reason you think we would be much more safer to do this from init? Is it because context image is not touched by user yet? The format of the image (commands in there and their offset) is fixed per HW generation. Only the date in each of the commands will vary per context. In the case of MI_LRI, the register offsets are the same for all context, but the value programmed will vary per context. So executing once should be enough to find the right offset, rather than every time we open the i915-perf stream. I think once you have the logic to make sure you never read outside the image it should be alright. -Lionel Thanks, Umesh Thanks, -Lionel + if (__find_reg_in_lri(state, reg, &offset)) + break; + } else { + offset++; + } + } + + return offset < len ? offset : U32_MAX; +} + +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce) +{ + i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base); + struct i915_perf *perf = &ce->engine->i915->perf; + u32 saved_offset = perf->ctx_oactxctrl_offset; + u32 offset; + + /* Do this only once. Failure is stored as offset of U32_MAX */ + if (saved_offset) + return 0; + + offset = __context_image_offset(ce, i915_mmio_reg_offset(reg)); + perf->ctx_oactxctrl_offset = offset; + + drm_dbg(&ce->engine->i915->drm, + "%s oa ctx control at 0x%08x dword offset\n", + ce->engine->name, offset); + + return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV; +} + +static bool engine_supports_mi_query(struct intel_engine_cs *engine) +{ + return engine->class == RENDER_CLASS; +} + /** * oa_get_render_ctx_id - determine and hold ctx hw id * @stream: An i915-perf stream opened for OA metrics @@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) if (IS_ERR(ce)) return PTR_ERR(ce); + if (engine_supports_mi_query(stream->engine)) { + ret = __set_oa_ctx_ctrl_offset(ce); + if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) { + intel_context_unpin(ce); + drm_err(&stream->perf->i915->drm, + "Enabling perf query failed for %s\n"
Re: [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
On 06/09/2022 23:28, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 10:56:13PM +0300, Lionel Landwerlin wrote: On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: User passes uabi engine class and instance to the perf OA interface. Use gt corresponding to the engine to pin the buffers to the right ggtt. Signed-off-by: Umesh Nerlige Ramappa I didn't know there was a GGTT per engine. Do I understand this correct? No, GGTT is still per-gt. We just derive the gt from engine class instance passed (as in engine->gt). Oh thanks I understand now. Reviewed-by: Lionel Landwerlin Thanks, -Lionel --- drivers/gpu/drm/i915/i915_perf.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 87b92d2946f4..f7621b45966c 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream) static int alloc_oa_buffer(struct i915_perf_stream *stream) { struct drm_i915_private *i915 = stream->perf->i915; + struct intel_gt *gt = stream->engine->gt; struct drm_i915_gem_object *bo; struct i915_vma *vma; int ret; @@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC); /* PreHSW required 512K alignment, HSW requires 16M */ - vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0); + vma = i915_vma_instance(bo, >->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto err_unref; } + + /* + * PreHSW required 512K alignment. + * HSW and onwards, align to requested size of OA buffer. + */ + ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH); + if (ret) { + drm_err(>->i915->drm, "Failed to pin OA buffer %d\n", ret); + goto err_unref; + } + stream->oa_buffer.vma = vma; stream->oa_buffer.vaddr = @@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, static int alloc_noa_wait(struct i915_perf_stream *stream) { struct drm_i915_private *i915 = stream->perf->i915; + struct intel_gt *gt = stream->engine->gt; struct drm_i915_gem_object *bo; struct i915_vma *vma; const u64 delay_ticks = 0x - @@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) * multiple OA config BOs will have a jump to this address and it * needs to be fixed during the lifetime of the i915/perf stream. */ - vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH); + vma = i915_vma_instance(bo, >->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out_ww; } + ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH); + if (ret) + goto out_ww; + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); if (IS_ERR(batch)) { ret = PTR_ERR(batch);
Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
On 06/09/2022 22:46, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 10:35:16PM +0300, Lionel Landwerlin wrote: On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Add new OA formats for DG2. Some of the newer OA formats are not multples of 64 bytes and are not powers of 2. For those formats, adjust hw_tail accordingly when checking for new reports. Signed-off-by: Umesh Nerlige Ramappa Apart from the coding style issue : Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 63 include/uapi/drm/i915_drm.h | 6 +++ 2 files changed, 46 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 735244a3aedd..c8331b549d31 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 10; /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can - * be used as a mask to align the OA tail pointer. + * be used as a mask to align the OA tail pointer. In some of the + * formats, R is used to denote reserved field. */ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = { [I915_OA_FORMAT_A13] = { 0, 64 }, @@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = { [I915_OA_FORMAT_A12] = { 0, 64 }, [I915_OA_FORMAT_A12_B8_C8] = { 2, 128 }, [I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 }, + [I915_OAR_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 }, + [I915_OA_FORMAT_A24u40_A14u32_B8_C8] = { 5, 256 }, + [I915_OAR_FORMAT_A36u64_B8_C8] = { 1, 384 }, + [I915_OA_FORMAT_A38u64_R2u64_B8_C8] = { 1, 448 }, }; #define SAMPLE_OA_REPORT (1<<0) @@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) bool pollin; u32 hw_tail; u64 now; + u32 partial_report_size; /* We have to consider the (unlikely) possibility that read() errors * could result in an OA buffer reset which might reset the head and @@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) hw_tail = stream->perf->ops.oa_hw_tail_read(stream); - /* The tail pointer increases in 64 byte increments, - * not in report_size steps... + /* The tail pointer increases in 64 byte increments, whereas report + * sizes need not be integral multiples or 64 or powers of 2. + * Compute potentially partially landed report in the OA buffer */ - hw_tail &= ~(report_size - 1); + partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail); + partial_report_size %= report_size; + + /* Subtract partial amount off the tail */ + hw_tail = gtt_offset + ((hw_tail - partial_report_size) & + (stream->oa_buffer.vma->size - 1)); now = ktime_get_mono_fast_ns(); @@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, { int report_size = stream->oa_buffer.format_size; struct drm_i915_perf_record_header header; + int report_size_partial; + u8 *oa_buf_end; header.type = DRM_I915_PERF_RECORD_SAMPLE; header.pad = 0; @@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream, return -EFAULT; buf += sizeof(header); - if (copy_to_user(buf, report, report_size)) + oa_buf_end = stream->oa_buffer.vaddr + + stream->oa_buffer.vma->size; + report_size_partial = oa_buf_end - report; + + if (report_size_partial < report_size) { + if(copy_to_user(buf, report, report_size_partial)) + return -EFAULT; + buf += report_size_partial; + + if(copy_to_user(buf, stream->oa_buffer.vaddr, + report_size - report_size_partial)) + return -EFAULT; I think the coding style requires you to use if () not if() Will fix. Just a suggestion : you could make this code deal with the partial bit as the main bit of the function : oa_buf_end = stream->oa_buffer.vaddr + stream->oa_buffer.vma->size; report_size_partial = oa_buf_end - report; if (copy_to_user(buf, report, report_size_partial)) return -EFAULT; buf += report_size_partial; This ^ may not work because append_oa_sample is appending exactly one report to the user buffer, whereas the above may append more than one. Thanks, Umesh Ah I see, thanks for pointing this out. -Lionel if (report_size_partial < report_size && copy_to_user(buf, stream->oa_buffer.vaddr, report_size - report_size_partial)) return -EFAULT; buf += report_size - report_size_partial; + } else if (copy_to_user(buf, report, report_size)) return -EFAULT; (*offset) += header.size; @@ -684,8 +710,8 @@ static int g
Re: [Intel-gfx] [PATCH 11/19] drm/i915/perf: Store a pointer to oa_format in oa_buffer
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: DG2 introduces OA reports with 64 bit report header fields. Perf OA would need more information about the OA format in order to process such reports. Store all OA format info in oa_buffer instead of just the size and format-id. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 23 ++- drivers/gpu/drm/i915/i915_perf_types.h | 3 +-- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f7621b45966c..9e455bd3bce5 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -483,7 +483,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream) static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) { u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); - int report_size = stream->oa_buffer.format_size; + int report_size = stream->oa_buffer.format->size; unsigned long flags; bool pollin; u32 hw_tail; @@ -630,7 +630,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, size_t *offset, const u8 *report) { - int report_size = stream->oa_buffer.format_size; + int report_size = stream->oa_buffer.format->size; struct drm_i915_perf_record_header header; int report_size_partial; u8 *oa_buf_end; @@ -694,7 +694,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, size_t *offset) { struct intel_uncore *uncore = stream->uncore; - int report_size = stream->oa_buffer.format_size; + int report_size = stream->oa_buffer.format->size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); size_t start_offset = *offset; @@ -970,7 +970,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, size_t *offset) { struct intel_uncore *uncore = stream->uncore; - int report_size = stream->oa_buffer.format_size; + int report_size = stream->oa_buffer.format->size; u8 *oa_buf_base = stream->oa_buffer.vaddr; u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); u32 mask = (OA_BUFFER_SIZE - 1); @@ -2517,7 +2517,7 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream, { int err; struct intel_context *ce = stream->pinned_ctx; - u32 format = stream->oa_buffer.format; + u32 format = stream->oa_buffer.format->format; u32 offset = stream->perf->ctx_oactxctrl_offset; struct flex regs_context[] = { { @@ -2890,7 +2890,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) u32 ctx_id = stream->specific_ctx_id; bool periodic = stream->periodic; u32 period_exponent = stream->period_exponent; - u32 report_format = stream->oa_buffer.format; + u32 report_format = stream->oa_buffer.format->format; /* * Reset buf pointers so we don't forward reports from before now. @@ -2916,7 +2916,7 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) static void gen8_oa_enable(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; - u32 report_format = stream->oa_buffer.format; + u32 report_format = stream->oa_buffer.format->format; /* * Reset buf pointers so we don't forward reports from before now. @@ -2942,7 +2942,7 @@ static void gen8_oa_enable(struct i915_perf_stream *stream) static void gen12_oa_enable(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; - u32 report_format = stream->oa_buffer.format; + u32 report_format = stream->oa_buffer.format->format; /* * If we don't want OA reports from the OA buffer, then we don't even @@ -3184,15 +3184,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->sample_flags = props->sample_flags; stream->sample_size += format_size; - stream->oa_buffer.format_size = format_size; - if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0)) + stream->oa_buffer.format = &perf->oa_formats[props->oa_format]; + if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0)) return -EINVAL; stream->hold_preemption = props->hold_preemption; - stream->oa_buffer.format = - perf->oa_formats[props->oa_format].format; - stream->periodic = props->oa_pe
Re: [Intel-gfx] [PATCH 10/19] drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: User passes uabi engine class and instance to the perf OA interface. Use gt corresponding to the engine to pin the buffers to the right ggtt. Signed-off-by: Umesh Nerlige Ramappa I didn't know there was a GGTT per engine. Do I understand this correct? Thanks, -Lionel --- drivers/gpu/drm/i915/i915_perf.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 87b92d2946f4..f7621b45966c 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1765,6 +1765,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream) static int alloc_oa_buffer(struct i915_perf_stream *stream) { struct drm_i915_private *i915 = stream->perf->i915; + struct intel_gt *gt = stream->engine->gt; struct drm_i915_gem_object *bo; struct i915_vma *vma; int ret; @@ -1784,11 +1785,22 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC); /* PreHSW required 512K alignment, HSW requires 16M */ - vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0); + vma = i915_vma_instance(bo, >->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto err_unref; } + + /* +* PreHSW required 512K alignment. +* HSW and onwards, align to requested size of OA buffer. +*/ + ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH); + if (ret) { + drm_err(>->i915->drm, "Failed to pin OA buffer %d\n", ret); + goto err_unref; + } + stream->oa_buffer.vma = vma; stream->oa_buffer.vaddr = @@ -1838,6 +1850,7 @@ static u32 *save_restore_register(struct i915_perf_stream *stream, u32 *cs, static int alloc_noa_wait(struct i915_perf_stream *stream) { struct drm_i915_private *i915 = stream->perf->i915; + struct intel_gt *gt = stream->engine->gt; struct drm_i915_gem_object *bo; struct i915_vma *vma; const u64 delay_ticks = 0x - @@ -1878,12 +1891,16 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) * multiple OA config BOs will have a jump to this address and it * needs to be fixed during the lifetime of the i915/perf stream. */ - vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH); + vma = i915_vma_instance(bo, >->ggtt->vm, NULL); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out_ww; } + ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH); + if (ret) + goto out_ww; + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); if (IS_ERR(batch)) { ret = PTR_ERR(batch);
Re: [Intel-gfx] [PATCH 08/19] drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Make perf part of gt as the OAG buffer is specific to a gt. The refactor eventually simplifies programming the right OA buffer and the right HW registers when supporting multiple gts. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_types.h | 3 + drivers/gpu/drm/i915/gt/intel_sseu.c | 4 +- drivers/gpu/drm/i915/i915_perf.c | 75 +- drivers/gpu/drm/i915/i915_perf_types.h | 39 +-- drivers/gpu/drm/i915/selftests/i915_perf.c | 16 +++-- 5 files changed, 80 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index 4d56f7d5a3be..3d079d206cec 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -20,6 +20,7 @@ #include "intel_gsc.h" #include "i915_vma.h" +#include "i915_perf_types.h" #include "intel_engine_types.h" #include "intel_gt_buffer_pool_types.h" #include "intel_hwconfig.h" @@ -260,6 +261,8 @@ struct intel_gt { /* sysfs defaults per gt */ struct gt_defaults defaults; struct kobject *sysfs_defaults; + + struct i915_perf_gt perf; }; enum intel_gt_scratch_field { diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index c6d3050604c8..fcaf3c58b554 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -678,8 +678,8 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt, * If i915/perf is active, we want a stable powergating configuration * on the system. Use the configuration pinned by i915/perf. */ - if (i915->perf.exclusive_stream) - req_sseu = &i915->perf.sseu; + if (gt->perf.exclusive_stream) + req_sseu = >->perf.sseu; slices = hweight8(req_sseu->slice_mask); subslices = hweight8(req_sseu->subslice_mask); diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3e3bda147c48..5dccb3c5 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1577,8 +1577,9 @@ free_noa_wait(struct i915_perf_stream *stream) static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct i915_perf *perf = stream->perf; + struct intel_gt *gt = stream->engine->gt; - BUG_ON(stream != perf->exclusive_stream); + BUG_ON(stream != gt->perf.exclusive_stream); /* * Unset exclusive_stream first, it will be checked while disabling @@ -1586,7 +1587,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) * * See i915_oa_init_reg_state() and lrc_configure_all_contexts() */ - WRITE_ONCE(perf->exclusive_stream, NULL); + WRITE_ONCE(gt->perf.exclusive_stream, NULL); perf->ops.disable_metric_set(stream); free_oa_buffer(stream); @@ -2579,10 +2580,11 @@ oa_configure_all_contexts(struct i915_perf_stream *stream, { struct drm_i915_private *i915 = stream->perf->i915; struct intel_engine_cs *engine; + struct intel_gt *gt = stream->engine->gt; struct i915_gem_context *ctx, *cn; int err; - lockdep_assert_held(&stream->perf->lock); + lockdep_assert_held(>->perf.lock); /* * The OA register config is setup through the context image. This image @@ -3103,6 +3105,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, { struct drm_i915_private *i915 = stream->perf->i915; struct i915_perf *perf = stream->perf; + struct intel_gt *gt; int format_size; int ret; @@ -3111,6 +3114,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, "OA engine not specified\n"); return -EINVAL; } + gt = props->engine->gt; /* * If the sysfs metrics/ directory wasn't registered for some @@ -3141,7 +3145,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * counter reports and marshal to the appropriate client * we currently only allow exclusive access */ - if (perf->exclusive_stream) { + if (gt->perf.exclusive_stream) { drm_dbg(&stream->perf->i915->drm, "OA unit already in use\n"); return -EBUSY; @@ -3221,8 +3225,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->ops = &i915_oa_stream_ops; - perf->sseu = props->sseu; - WRITE_ONCE(perf->exclusive_stream, stream); + stream->engine->gt->perf.sseu = pr
Re: [Intel-gfx] [PATCH 07/19] drm/i915/perf: Simply use stream->ctx
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Earlier code used exclusive_stream to check for user passed context. Simplify this by accessing stream->ctx. Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index bbf1c574f393..3e3bda147c48 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -801,7 +801,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * switches since it's not-uncommon for periodic samples to * identify a switch before any 'context switch' report. */ - if (!stream->perf->exclusive_stream->ctx || + if (!stream->ctx || stream->specific_ctx_id == ctx_id || stream->oa_buffer.last_ctx_id == stream->specific_ctx_id || reason & OAREPORT_REASON_CTX_SWITCH) { @@ -810,7 +810,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * While filtering for a single context we avoid * leaking the IDs of other contexts. */ - if (stream->perf->exclusive_stream->ctx && + if (stream->ctx && stream->specific_ctx_id != ctx_id) { report32[2] = INVALID_CTX_ID; }
Re: [Intel-gfx] [PATCH 05/19] drm/i915/perf: Enable commands per clock reporting in OA
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: XEHPSDV and DG2 provide a way to configure bytes per clock vs commands per clock reporting. Enable command per clock setting on enabling OA. Signed-off-by: Umesh Nerlige Ramappa Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 3 +++ drivers/gpu/drm/i915/i915_pci.c | 1 + drivers/gpu/drm/i915/i915_perf.c | 20 drivers/gpu/drm/i915/i915_perf_oa_regs.h | 4 drivers/gpu/drm/i915/intel_device_info.h | 1 + 5 files changed, 29 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b4733c5a01da..b2e8a44bd976 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1287,6 +1287,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm) #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc) +#define HAS_OA_BPC_REPORTING(dev_priv) \ + (INTEL_INFO(dev_priv)->has_oa_bpc_reporting) + /* * Set this flag, when platform requires 64K GTT page sizes or larger for * device local memory access. diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index d8446bb25d5e..bd0b8502b91e 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1019,6 +1019,7 @@ static const struct intel_device_info adl_p_info = { .has_logical_ring_contexts = 1, \ .has_logical_ring_elsq = 1, \ .has_mslice_steering = 1, \ + .has_oa_bpc_reporting = 1, \ .has_rc6 = 1, \ .has_reset_engine = 1, \ .has_rps = 1, \ diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index efa7eda83edd..6fc4f0d8fc5a 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2745,10 +2745,12 @@ static int gen12_enable_metric_set(struct i915_perf_stream *stream, struct i915_active *active) { + struct drm_i915_private *i915 = stream->perf->i915; struct intel_uncore *uncore = stream->uncore; struct i915_oa_config *oa_config = stream->oa_config; bool periodic = stream->periodic; u32 period_exponent = stream->period_exponent; + u32 sqcnt1; int ret; intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG, @@ -2767,6 +2769,16 @@ gen12_enable_metric_set(struct i915_perf_stream *stream, (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT)) : 0); + /* +* Initialize Super Queue Internal Cnt Register +* Set PMON Enable in order to collect valid metrics. +* Enable commands per clock reporting in OA for XEHPSDV onward. +*/ + sqcnt1 = GEN12_SQCNT1_PMON_ENABLE | +(HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0); + + intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1); + /* * Update all contexts prior writing the mux configurations as we need * to make sure all slices/subslices are ON before writing to NOA @@ -2816,6 +2828,8 @@ static void gen11_disable_metric_set(struct i915_perf_stream *stream) static void gen12_disable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; + struct drm_i915_private *i915 = stream->perf->i915; + u32 sqcnt1; /* Reset all contexts' slices/subslices configurations. */ gen12_configure_all_contexts(stream, NULL, NULL); @@ -2826,6 +2840,12 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream) /* Make sure we disable noa to save power. */ intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0); + + sqcnt1 = GEN12_SQCNT1_PMON_ENABLE | +(HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0); + + /* Reset PMON Enable to save power. */ + intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0); } static void gen7_oa_enable(struct i915_perf_stream *stream) diff --git a/drivers/gpu/drm/i915/i915_perf_oa_regs.h b/drivers/gpu/drm/i915/i915_perf_oa_regs.h index 0ef3562ff4aa..381d94101610 100644 --- a/drivers/gpu/drm/i915/i915_perf_oa_regs.h +++ b/drivers/gpu/drm/i915/i915_perf_oa_regs.h @@ -134,4 +134,8 @@ #define GDT_CHICKEN_BITS_MMIO(0x9840) #define GT_NOA_ENABLE 0x0080 +#define GEN12_SQCNT1_MMIO(0x8718) +#define GEN12_SQCNT1_PMON_ENABLE REG_BIT(30) +#define GEN12_SQCNT1_OABPC REG_BIT(29) + #endif /* __INTEL_PERF_OA_REGS__ */ diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 23bf230aa104..fc2a0660426e 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -163,6 +163,7 @@ enum intel_
Re: [Intel-gfx] [PATCH 04/19] drm/i915/perf: Determine gen12 oa ctx offset at runtime
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Some SKUs of same gen12 platform may have different oactxctrl offsets. For gen12, determine oactxctrl offsets at runtime. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_perf.c | 149 ++- drivers/gpu/drm/i915/i915_perf_oa_regs.h | 2 +- 2 files changed, 120 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3526693d64fa..efa7eda83edd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1363,6 +1363,67 @@ static int gen12_get_render_context_id(struct i915_perf_stream *stream) return 0; } +#define MI_OPCODE(x) (((x) >> 23) & 0x3f) +#define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0))) +#define MI_LRI_LEN(x) (((x) & 0xff) + 1) Maybe you want to put this in intel_gpu_commands.h +#define __valid_oactxctrl_offset(x) ((x) && (x) != U32_MAX) +static bool __find_reg_in_lri(u32 *state, u32 reg, u32 *offset) +{ + u32 idx = *offset; + u32 len = MI_LRI_LEN(state[idx]) + idx; + + idx++; + for (; idx < len; idx += 2) + if (state[idx] == reg) + break; + + *offset = idx; + return state[idx] == reg; +} + +static u32 __context_image_offset(struct intel_context *ce, u32 reg) +{ + u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4; + u32 *state = ce->lrc_reg_state; + + for (offset = 0; offset < len; ) { + if (IS_MI_LRI_CMD(state[offset])) { I'm a bit concerned you might find other matches with this. Because let's say you run into a 3DSTATE_SUBSLICE_HASH_TABLE instruction, you'll iterate the instruction dword by dword because you don't know how to read its length and skip to the next one. Now some of the fields can be programmed from userspace to look like an MI_LRI header, so you start to read data in the wrong way. Unfortunately I don't have a better solution. My only ask is that you make __find_reg_in_lri() take the context image size in parameter so it NEVER goes over the the context image. To limit the risk you should run this function only one at driver initialization and store the found offset. Thanks, -Lionel + if (__find_reg_in_lri(state, reg, &offset)) + break; + } else { + offset++; + } + } + + return offset < len ? offset : U32_MAX; +} + +static int __set_oa_ctx_ctrl_offset(struct intel_context *ce) +{ + i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base); + struct i915_perf *perf = &ce->engine->i915->perf; + u32 saved_offset = perf->ctx_oactxctrl_offset; + u32 offset; + + /* Do this only once. Failure is stored as offset of U32_MAX */ + if (saved_offset) + return 0; + + offset = __context_image_offset(ce, i915_mmio_reg_offset(reg)); + perf->ctx_oactxctrl_offset = offset; + + drm_dbg(&ce->engine->i915->drm, + "%s oa ctx control at 0x%08x dword offset\n", + ce->engine->name, offset); + + return __valid_oactxctrl_offset(offset) ? 0 : -ENODEV; +} + +static bool engine_supports_mi_query(struct intel_engine_cs *engine) +{ + return engine->class == RENDER_CLASS; +} + /** * oa_get_render_ctx_id - determine and hold ctx hw id * @stream: An i915-perf stream opened for OA metrics @@ -1382,6 +1443,17 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream) if (IS_ERR(ce)) return PTR_ERR(ce); + if (engine_supports_mi_query(stream->engine)) { + ret = __set_oa_ctx_ctrl_offset(ce); + if (ret && !(stream->sample_flags & SAMPLE_OA_REPORT)) { + intel_context_unpin(ce); + drm_err(&stream->perf->i915->drm, + "Enabling perf query failed for %s\n", + stream->engine->name); + return ret; + } + } + switch (GRAPHICS_VER(ce->engine->i915)) { case 7: { /* @@ -2412,10 +2484,11 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream, int err; struct intel_context *ce = stream->pinned_ctx; u32 format = stream->oa_buffer.format; + u32 offset = stream->perf->ctx_oactxctrl_offset; struct flex regs_context[] = { { GEN8_OACTXCONTROL, - stream->perf->ctx_oactxctrl_offset + 1, + offset + 1, active ? GEN8_OA_COUNTER_RESUME : 0, }, }; @@ -2440,15 +2513,18 @@ static int gen12_configure_oar_context(struct i915_perf_stream *stream, }, }; - /* Modify the context image of pinned context with
Re: [Intel-gfx] [PATCH 02/19] drm/i915/perf: Add OA formats for DG2
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: Add new OA formats for DG2. Some of the newer OA formats are not multples of 64 bytes and are not powers of 2. For those formats, adjust hw_tail accordingly when checking for new reports. Signed-off-by: Umesh Nerlige Ramappa Apart from the coding style issue : Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 63 include/uapi/drm/i915_drm.h | 6 +++ 2 files changed, 46 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 735244a3aedd..c8331b549d31 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -306,7 +306,8 @@ static u32 i915_oa_max_sample_rate = 10; /* XXX: beware if future OA HW adds new report formats that the current * code assumes all reports have a power-of-two size and ~(size - 1) can - * be used as a mask to align the OA tail pointer. + * be used as a mask to align the OA tail pointer. In some of the + * formats, R is used to denote reserved field. */ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = { [I915_OA_FORMAT_A13]= { 0, 64 }, @@ -320,6 +321,10 @@ static const struct i915_oa_format oa_formats[I915_OA_FORMAT_MAX] = { [I915_OA_FORMAT_A12]= { 0, 64 }, [I915_OA_FORMAT_A12_B8_C8] = { 2, 128 }, [I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 }, + [I915_OAR_FORMAT_A32u40_A4u32_B8_C8]= { 5, 256 }, + [I915_OA_FORMAT_A24u40_A14u32_B8_C8]= { 5, 256 }, + [I915_OAR_FORMAT_A36u64_B8_C8] = { 1, 384 }, + [I915_OA_FORMAT_A38u64_R2u64_B8_C8] = { 1, 448 }, }; #define SAMPLE_OA_REPORT (1<<0) @@ -467,6 +472,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) bool pollin; u32 hw_tail; u64 now; + u32 partial_report_size; /* We have to consider the (unlikely) possibility that read() errors * could result in an OA buffer reset which might reset the head and @@ -476,10 +482,16 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) hw_tail = stream->perf->ops.oa_hw_tail_read(stream); - /* The tail pointer increases in 64 byte increments, -* not in report_size steps... + /* The tail pointer increases in 64 byte increments, whereas report +* sizes need not be integral multiples or 64 or powers of 2. +* Compute potentially partially landed report in the OA buffer */ - hw_tail &= ~(report_size - 1); + partial_report_size = OA_TAKEN(hw_tail, stream->oa_buffer.tail); + partial_report_size %= report_size; + + /* Subtract partial amount off the tail */ + hw_tail = gtt_offset + ((hw_tail - partial_report_size) & + (stream->oa_buffer.vma->size - 1)); now = ktime_get_mono_fast_ns(); @@ -601,6 +613,8 @@ static int append_oa_sample(struct i915_perf_stream *stream, { int report_size = stream->oa_buffer.format_size; struct drm_i915_perf_record_header header; + int report_size_partial; + u8 *oa_buf_end; header.type = DRM_I915_PERF_RECORD_SAMPLE; header.pad = 0; @@ -614,7 +628,19 @@ static int append_oa_sample(struct i915_perf_stream *stream, return -EFAULT; buf += sizeof(header); - if (copy_to_user(buf, report, report_size)) + oa_buf_end = stream->oa_buffer.vaddr + +stream->oa_buffer.vma->size; + report_size_partial = oa_buf_end - report; + + if (report_size_partial < report_size) { + if(copy_to_user(buf, report, report_size_partial)) + return -EFAULT; + buf += report_size_partial; + + if(copy_to_user(buf, stream->oa_buffer.vaddr, + report_size - report_size_partial)) + return -EFAULT; I think the coding style requires you to use if () not if() Just a suggestion : you could make this code deal with the partial bit as the main bit of the function : oa_buf_end = stream->oa_buffer.vaddr + stream->oa_buffer.vma->size; report_size_partial = oa_buf_end - report; if (copy_to_user(buf, report, report_size_partial)) return -EFAULT; buf += report_size_partial; if (report_size_partial < report_size && copy_to_user(buf, stream->oa_buffer.vaddr, report_size - report_size_partial)) return -EFAULT; buf += report_size - report_size_partial; + } else if (copy_to_user(buf, report, report_size)) return -EFAULT; (*offset) += header.size; @@ -684,8 +710,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, * all a power of two). */
Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
On 06/09/2022 20:39, Umesh Nerlige Ramappa wrote: On Tue, Sep 06, 2022 at 05:33:00PM +0300, Lionel Landwerlin wrote: On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: With GuC mode of submission, GuC is in control of defining the context id field that is part of the OA reports. To filter reports, UMD and KMD must know what sw context id was chosen by GuC. There is not interface between KMD and GuC to determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA reports for the specific context. Signed-off-by: Umesh Nerlige Ramappa I assume you checked with GuC that this doesn't change as the context is running? Correct. With i915/execlist submission mode, we had to ask i915 to pin the sw_id/ctx_id. From GuC perspective, the context id can change once KMD de-registers the context and that will not happen while the context is in use. Thanks, Umesh Thanks Umesh, Maybe I should have been more precise in my question : Can the ID change while the i915-perf stream is opened? Because the ID not changing while the context is running makes sense. But since the number of available IDs is limited to 2k or something on Gfx12, it's possible the GuC has to reuse IDs if too many apps want to run during the period of time while i915-perf is active and filtering. -Lionel If that's not the case then filtering is broken. -Lionel --- drivers/gpu/drm/i915/gt/intel_lrc.h | 2 + drivers/gpu/drm/i915/i915_perf.c | 141 2 files changed, 124 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index a390f0813c8b..7111bae759f3 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -110,6 +110,8 @@ enum { #define XEHP_SW_CTX_ID_WIDTH 16 #define XEHP_SW_COUNTER_SHIFT 58 #define XEHP_SW_COUNTER_WIDTH 6 +#define GEN12_GUC_SW_CTX_ID_SHIFT 39 +#define GEN12_GUC_SW_CTX_ID_WIDTH 16 static inline void lrc_runtime_start(struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f3c23fe9ad9c..735244a3aedd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) return stream->pinned_ctx; } +static int +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset) +{ + u32 *cs, cmd; + + cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; + if (GRAPHICS_VER(rq->engine->i915) >= 8) + cmd++; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = cmd; + *cs++ = i915_mmio_reg_offset(reg); + *cs++ = ggtt_offset; + *cs++ = 0; + + intel_ring_advance(rq, cs); + + return 0; +} + +static int +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset) +{ + struct i915_request *rq; + int err; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + i915_request_get(rq); + + err = __store_reg_to_mem(rq, reg, ggtt_offset); + + i915_request_add(rq); + if (!err && i915_request_wait(rq, 0, HZ / 2) < 0) + err = -ETIME; + + i915_request_put(rq); + + return err; +} + +static int +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id) +{ + struct i915_vma *scratch; + u32 *val; + int err; + + scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4); + if (IS_ERR(scratch)) + return PTR_ERR(scratch); + + err = i915_vma_sync(scratch); + if (err) + goto err_scratch; + + err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base), + i915_ggtt_offset(scratch)); + if (err) + goto err_scratch; + + val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); + if (IS_ERR(val)) { + err = PTR_ERR(val); + goto err_scratch; + } + + *ctx_id = *val; + i915_gem_object_unpin_map(scratch->obj); + +err_scratch: + i915_vma_unpin_and_release(&scratch, 0); + return err; +} + +/* + * For execlist mode of submission, pick an unused context id + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts + * XXX_MAX_CONTEXT_HW_ID is used by idle context + * + * For GuC mode of submission read context id from the upper dword of the + * EXECLIST_STATUS register. + */ +static int gen12_get_render_context_id(struct i915_perf_stream *stream) +{ + u32 ctx_id, mask; + int ret; + + if (intel_engine_uses_guc(stream->engine)) { + ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id); + if (ret) + return ret; + + mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) << + (GEN12_GUC_SW_CTX_ID_SHIFT - 32); +
Re: [Intel-gfx] [PATCH 01/19] drm/i915/perf: Fix OA filtering logic for GuC mode
On 23/08/2022 23:41, Umesh Nerlige Ramappa wrote: With GuC mode of submission, GuC is in control of defining the context id field that is part of the OA reports. To filter reports, UMD and KMD must know what sw context id was chosen by GuC. There is not interface between KMD and GuC to determine this, so read the upper-dword of EXECLIST_STATUS to filter/squash OA reports for the specific context. Signed-off-by: Umesh Nerlige Ramappa I assume you checked with GuC that this doesn't change as the context is running? With i915/execlist submission mode, we had to ask i915 to pin the sw_id/ctx_id. If that's not the case then filtering is broken. -Lionel --- drivers/gpu/drm/i915/gt/intel_lrc.h | 2 + drivers/gpu/drm/i915/i915_perf.c| 141 2 files changed, 124 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index a390f0813c8b..7111bae759f3 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -110,6 +110,8 @@ enum { #define XEHP_SW_CTX_ID_WIDTH 16 #define XEHP_SW_COUNTER_SHIFT 58 #define XEHP_SW_COUNTER_WIDTH 6 +#define GEN12_GUC_SW_CTX_ID_SHIFT 39 +#define GEN12_GUC_SW_CTX_ID_WIDTH 16 static inline void lrc_runtime_start(struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index f3c23fe9ad9c..735244a3aedd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1233,6 +1233,125 @@ static struct intel_context *oa_pin_context(struct i915_perf_stream *stream) return stream->pinned_ctx; } +static int +__store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset) +{ + u32 *cs, cmd; + + cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; + if (GRAPHICS_VER(rq->engine->i915) >= 8) + cmd++; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = cmd; + *cs++ = i915_mmio_reg_offset(reg); + *cs++ = ggtt_offset; + *cs++ = 0; + + intel_ring_advance(rq, cs); + + return 0; +} + +static int +__read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset) +{ + struct i915_request *rq; + int err; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + i915_request_get(rq); + + err = __store_reg_to_mem(rq, reg, ggtt_offset); + + i915_request_add(rq); + if (!err && i915_request_wait(rq, 0, HZ / 2) < 0) + err = -ETIME; + + i915_request_put(rq); + + return err; +} + +static int +gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id) +{ + struct i915_vma *scratch; + u32 *val; + int err; + + scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4); + if (IS_ERR(scratch)) + return PTR_ERR(scratch); + + err = i915_vma_sync(scratch); + if (err) + goto err_scratch; + + err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base), +i915_ggtt_offset(scratch)); + if (err) + goto err_scratch; + + val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); + if (IS_ERR(val)) { + err = PTR_ERR(val); + goto err_scratch; + } + + *ctx_id = *val; + i915_gem_object_unpin_map(scratch->obj); + +err_scratch: + i915_vma_unpin_and_release(&scratch, 0); + return err; +} + +/* + * For execlist mode of submission, pick an unused context id + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts + * XXX_MAX_CONTEXT_HW_ID is used by idle context + * + * For GuC mode of submission read context id from the upper dword of the + * EXECLIST_STATUS register. + */ +static int gen12_get_render_context_id(struct i915_perf_stream *stream) +{ + u32 ctx_id, mask; + int ret; + + if (intel_engine_uses_guc(stream->engine)) { + ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id); + if (ret) + return ret; + + mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) << + (GEN12_GUC_SW_CTX_ID_SHIFT - 32); + } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) { + ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) << + (XEHP_SW_CTX_ID_SHIFT - 32); + + mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) << + (XEHP_SW_CTX_ID_SHIFT - 32); + } else { + ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) << +(GEN11_SW_CTX_ID_SHIFT - 32); + + mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << + (GEN11_SW_CTX_ID_SHIFT - 32); +
Re: [Intel-gfx] [PATCH v3] drm/i915/dg2: Add performance workaround 18019455067
Ping? On 11/07/2022 14:30, Lionel Landwerlin wrote: Ping? On 30/06/2022 11:35, Lionel Landwerlin wrote: The recommended number of stackIDs for Ray Tracing subsystem is 512 rather than 2048 (default HW programming). v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas) v3: Move programming to general_render_compute_wa_init() (Matt) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 4 drivers/gpu/drm/i915/gt/intel_workarounds.c | 9 + 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..12fc87b957425 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,10 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL _MMIO(0xe530) +#define RT_CTRL_NUMBER_OF_STACKIDS_MASK REG_GENMASK(6, 5) +#define NUMBER_OF_STACKIDS_512 2 +#define NUMBER_OF_STACKIDS_1024 1 +#define NUMBER_OF_STACKIDS_2048 0 #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1 _MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..ea674e456cd76 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2737,6 +2737,15 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB); wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB); } + + if (IS_DG2(i915)) { + /* Performance tuning for Ray-tracing */ + wa_write_clr_set(wal, + RT_CTRL, + RT_CTRL_NUMBER_OF_STACKIDS_MASK, + REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK, + NUMBER_OF_STACKIDS_512)); + } } static void
Re: [Intel-gfx] [PATCH v3] drm/i915/dg2: Add performance workaround 18019455067
Ping? On 30/06/2022 11:35, Lionel Landwerlin wrote: The recommended number of stackIDs for Ray Tracing subsystem is 512 rather than 2048 (default HW programming). v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas) v3: Move programming to general_render_compute_wa_init() (Matt) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 4 drivers/gpu/drm/i915/gt/intel_workarounds.c | 9 + 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..12fc87b957425 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,10 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL _MMIO(0xe530) +#define RT_CTRL_NUMBER_OF_STACKIDS_MASK REG_GENMASK(6, 5) +#define NUMBER_OF_STACKIDS_512 2 +#define NUMBER_OF_STACKIDS_1024 1 +#define NUMBER_OF_STACKIDS_2048 0 #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1_MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..ea674e456cd76 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2737,6 +2737,15 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB); wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB); } + + if (IS_DG2(i915)) { + /* Performance tuning for Ray-tracing */ + wa_write_clr_set(wal, +RT_CTRL, +RT_CTRL_NUMBER_OF_STACKIDS_MASK, +REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK, + NUMBER_OF_STACKIDS_512)); + } } static void
Re: [Intel-gfx] [PATCH 2/2] i915/perf: Disable OA sseu config param for gfx12.50+
On 07/07/2022 22:30, Nerlige Ramappa, Umesh wrote: The global sseu config is applicable only to gen11 platforms where concurrent media, render and OA use cases may cause some subslices to be turned off and hence lose NOA configuration. Ideally we want to return ENODEV for non-gen11 platforms, however, this has shipped with gfx12, so disable only for gfx12.50+. v2: gfx12 is already shipped with this, disable for gfx12.50+ (Lionel) v3: (Matt) - Update commit message and replace "12.5" with "12.50" - Replace DRM_DEBUG() with driver specific drm_dbg() Signed-off-by: Umesh Nerlige Ramappa Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index b3beb89884e0..f3c23fe9ad9c 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3731,6 +3731,13 @@ static int read_properties_unlocked(struct i915_perf *perf, case DRM_I915_PERF_PROP_GLOBAL_SSEU: { struct drm_i915_gem_context_param_sseu user_sseu; + if (GRAPHICS_VER_FULL(perf->i915) >= IP_VER(12, 50)) { + drm_dbg(&perf->i915->drm, + "SSEU config not supported on gfx %x\n", + GRAPHICS_VER_FULL(perf->i915)); + return -ENODEV; + } + if (copy_from_user(&user_sseu, u64_to_user_ptr(value), sizeof(user_sseu))) {
Re: [Intel-gfx] [PATCH 1/2] i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call
On 07/07/2022 22:30, Nerlige Ramappa, Umesh wrote: DRM_DEBUG is not the right debug call to use in i915 OA, replace it with driver specific drm_dbg() call (Matt). Signed-off-by: Umesh Nerlige Ramappa Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 151 --- 1 file changed, 100 insertions(+), 51 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 1577ab6754db..b3beb89884e0 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -885,8 +885,9 @@ static int gen8_oa_read(struct i915_perf_stream *stream, if (ret) return ret; - DRM_DEBUG("OA buffer overflow (exponent = %d): force restart\n", - stream->period_exponent); + drm_dbg(&stream->perf->i915->drm, + "OA buffer overflow (exponent = %d): force restart\n", + stream->period_exponent); stream->perf->ops.oa_disable(stream); stream->perf->ops.oa_enable(stream); @@ -1108,8 +1109,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream, if (ret) return ret; - DRM_DEBUG("OA buffer overflow (exponent = %d): force restart\n", - stream->period_exponent); + drm_dbg(&stream->perf->i915->drm, + "OA buffer overflow (exponent = %d): force restart\n", + stream->period_exponent); stream->perf->ops.oa_disable(stream); stream->perf->ops.oa_enable(stream); @@ -2863,7 +2865,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, int ret; if (!props->engine) { - DRM_DEBUG("OA engine not specified\n"); + drm_dbg(&stream->perf->i915->drm, + "OA engine not specified\n"); return -EINVAL; } @@ -2873,18 +2876,21 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * IDs */ if (!perf->metrics_kobj) { - DRM_DEBUG("OA metrics weren't advertised via sysfs\n"); + drm_dbg(&stream->perf->i915->drm, + "OA metrics weren't advertised via sysfs\n"); return -EINVAL; } if (!(props->sample_flags & SAMPLE_OA_REPORT) && (GRAPHICS_VER(perf->i915) < 12 || !stream->ctx)) { - DRM_DEBUG("Only OA report sampling supported\n"); + drm_dbg(&stream->perf->i915->drm, + "Only OA report sampling supported\n"); return -EINVAL; } if (!perf->ops.enable_metric_set) { - DRM_DEBUG("OA unit not supported\n"); + drm_dbg(&stream->perf->i915->drm, + "OA unit not supported\n"); return -ENODEV; } @@ -2894,12 +2900,14 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, * we currently only allow exclusive access */ if (perf->exclusive_stream) { - DRM_DEBUG("OA unit already in use\n"); + drm_dbg(&stream->perf->i915->drm, + "OA unit already in use\n"); return -EBUSY; } if (!props->oa_format) { - DRM_DEBUG("OA report format not specified\n"); + drm_dbg(&stream->perf->i915->drm, + "OA report format not specified\n"); return -EINVAL; } @@ -2929,20 +2937,23 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, if (stream->ctx) { ret = oa_get_render_ctx_id(stream); if (ret) { - DRM_DEBUG("Invalid context id to filter with\n"); + drm_dbg(&stream->perf->i915->drm, + "Invalid context id to filter with\n"); return ret; } } ret = alloc_noa_wait(stream); if (ret) { - DRM_DEBUG("Unable to allocate NOA wait batch buffer\n"); + drm_dbg(&stream->perf->i915->drm, + "Unable to allocate NOA wait batch buffer\n"); goto err_noa_wait_alloc; } stream->oa_config = i915_perf_get_oa_config(perf, props->metrics_set); if (!stream->oa_config) { - DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); +
Re: [Intel-gfx] [PATCH] i915/perf: Disable OA sseu config param for non-gen11 platforms
On 07/07/2022 00:52, Nerlige Ramappa, Umesh wrote: The global sseu config is applicable only to gen11 platforms where concurrent media, render and OA use cases may cause some subslices to be turned off and hence lose NOA configuration. Return ENODEV for non-gen11 platforms. Signed-off-by: Umesh Nerlige Ramappa The problem is that we have gfx12 platforms shipped using this. So I guess have to disable it on gfx12.5+ where it was never accepted. -Lionel --- drivers/gpu/drm/i915/i915_perf.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 1577ab6754db..512c163fdbeb 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3706,6 +3706,12 @@ static int read_properties_unlocked(struct i915_perf *perf, case DRM_I915_PERF_PROP_GLOBAL_SSEU: { struct drm_i915_gem_context_param_sseu user_sseu; + if (GRAPHICS_VER(perf->i915) != 11) { + DRM_DEBUG("Global SSEU config not supported on gen%d\n", + GRAPHICS_VER(perf->i915)); + return -ENODEV; + } + if (copy_from_user(&user_sseu, u64_to_user_ptr(value), sizeof(user_sseu))) {
Re: [Intel-gfx] [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
On 30/06/2022 20:12, Zanoni, Paulo R wrote: Can you please explain what happens when we try to write to a range that's bound as read-only? It will be mapped as read-only in device page table. Hence any write access will fail. I would expect a CAT error reported. What's a CAT error? Does this lead to machine freeze or a GPU hang? Let's make sure we document this. Catastrophic error. Reading the documentation, it seems the behavior depends on the context type. With the Legacy 64bit context type, writes are ignored (BSpec 531) : - "For legacy context, the access rights are not applicable and should not be considered during page walk." For Advanced 64bit context type, I think the HW will generate a pagefault. -Lionel
Re: [Intel-gfx] [PATCH v2] drm/i915/dg2: Add performance workaround 18019455067
On 30/06/2022 01:16, Matt Roper wrote: On Mon, Jun 27, 2022 at 03:59:28PM +0300, Lionel Landwerlin wrote: The recommended number of stackIDs for Ray Tracing subsystem is 512 rather than 2048 (default HW programming). v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas) I'm not sure this is actually the correct move. As far as I can see on bspec 46261, RT_CTRL isn't part of the engine's context, so we need to make sure it gets added to engine->wa_list instead of engine->ctx_wa_list, otherwise it won't be properly re-applied after engine resets and such. Most of our other tuning values are part of the context image, so this one is a bit unusual. To get it onto the engine->wa_list, the workaround needs to either be defined via rcs_engine_wa_init() or general_render_compute_wa_init(). The latter is the new, preferred location for registers that are part of the render/compute reset domain, but that don't live in the RCS engine's 0x2xxx MMIO range (since all RCS and CCS engines get reset together, the items in general_render_compute_wa_init() will make sure it's dealt with as part of the handling for the first RCS/CCS engine, so that we won't miss out on applying it if the platform doesn't have an RCS). At the moment we don't have too many "tuning" values that we need to set that aren't part of an engine's context, so we don't yet have a dedicated "tuning" function for engine-style workarounds like we do with ctx-style workarounds. Matt Thanks Matt, I didn't pay attention to the register offset and that it's not context/engine specific. Moving it to general_render_compute_wa_init() -Lionel Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 4 drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 + 2 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..12fc87b957425 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,10 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL _MMIO(0xe530) +#define RT_CTRL_NUMBER_OF_STACKIDS_MASK REG_GENMASK(6, 5) +#define NUMBER_OF_STACKIDS_512 2 +#define NUMBER_OF_STACKIDS_1024 1 +#define NUMBER_OF_STACKIDS_2048 0 #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1_MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..4d80716b957d4 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -575,6 +575,11 @@ static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine, FF_MODE2_TDS_TIMER_MASK, FF_MODE2_TDS_TIMER_128, 0, false); + wa_write_clr_set(wal, +RT_CTRL, +RT_CTRL_NUMBER_OF_STACKIDS_MASK, +REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK, + NUMBER_OF_STACKIDS_512)); } /* -- 2.34.1
[Intel-gfx] [PATCH v3] drm/i915/dg2: Add performance workaround 18019455067
The recommended number of stackIDs for Ray Tracing subsystem is 512 rather than 2048 (default HW programming). v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas) v3: Move programming to general_render_compute_wa_init() (Matt) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 4 drivers/gpu/drm/i915/gt/intel_workarounds.c | 9 + 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..12fc87b957425 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,10 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL_MMIO(0xe530) +#define RT_CTRL_NUMBER_OF_STACKIDS_MASK REG_GENMASK(6, 5) +#define NUMBER_OF_STACKIDS_512 2 +#define NUMBER_OF_STACKIDS_1024 1 +#define NUMBER_OF_STACKIDS_2048 0 #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1 _MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..ea674e456cd76 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2737,6 +2737,15 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB); wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB); } + + if (IS_DG2(i915)) { + /* Performance tuning for Ray-tracing */ + wa_write_clr_set(wal, +RT_CTRL, +RT_CTRL_NUMBER_OF_STACKIDS_MASK, +REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK, + NUMBER_OF_STACKIDS_512)); + } } static void -- 2.34.1
[Intel-gfx] [PATCH v2] drm/i915/dg2: Add performance workaround 18019455067
The recommended number of stackIDs for Ray Tracing subsystem is 512 rather than 2048 (default HW programming). v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 4 drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 + 2 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..12fc87b957425 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,10 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL_MMIO(0xe530) +#define RT_CTRL_NUMBER_OF_STACKIDS_MASK REG_GENMASK(6, 5) +#define NUMBER_OF_STACKIDS_512 2 +#define NUMBER_OF_STACKIDS_1024 1 +#define NUMBER_OF_STACKIDS_2048 0 #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1 _MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..4d80716b957d4 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -575,6 +575,11 @@ static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine, FF_MODE2_TDS_TIMER_MASK, FF_MODE2_TDS_TIMER_128, 0, false); + wa_write_clr_set(wal, +RT_CTRL, +RT_CTRL_NUMBER_OF_STACKIDS_MASK, +REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK, + NUMBER_OF_STACKIDS_512)); } /* -- 2.34.1
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 23/06/2022 14:05, Tvrtko Ursulin wrote: On 23/06/2022 09:57, Lionel Landwerlin wrote: On 23/06/2022 11:27, Tvrtko Ursulin wrote: After a vm_unbind, UMD can re-bind to same VA range against an active VM. Though I am not sue with Mesa usecase if that new mapping is required for running GPU job or it will be for the next submission. But ensuring the tlb flush upon unbind, KMD can ensure correctness. Isn't that their problem? If they re-bind for submitting _new_ work then they get the flush as part of batch buffer pre-amble. In the non sparse case, if a VA range is unbound, it is invalid to use that range for anything until it has been rebound by something else. We'll take the fence provided by vm_bind and put it as a wait fence on the next execbuffer. It might be safer in case of memory over fetching? TLB flush will have to happen at some point right? What's the alternative to do it in unbind? Currently TLB flush happens from the ring before every BB_START and also when i915 returns the backing store pages to the system. For the former, I haven't seen any mention that for execbuf3 there are plans to stop doing it? Anyway, as long as this is kept and sequence of bind[1..N]+execbuf is safe and correctly sees all the preceding binds. Hence about the alternative to doing it in unbind - first I think lets state the problem that is trying to solve. For instance is it just for the compute "append work to the running batch" use case? I honestly don't remember how was that supposed to work so maybe the tlb flush on bind was supposed to deal with that scenario? Or you see a problem even for Mesa with the current model? Regards, Tvrtko As far as I can tell, all the binds should have completed before execbuf starts if you follow the vulkan sparse binding rules. For non-sparse, the UMD will take care of it. I think we're fine. -Lionel
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 22/06/2022 18:12, Niranjana Vishwanathapura wrote: On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote: On 22/06/2022 04:56, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions v2: Reduce the scope to simple Mesa use case. v3: Expand VM_UNBIND documentation and add I915_GEM_VM_BIND/UNBIND_FENCE_VALID and I915_GEM_VM_BIND_TLB_FLUSH flags. Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 243 +++ 1 file changed, 243 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..fa23b2d7ec6f --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,243 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) + +/** + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification. + * + * A timeline out fence for vm_bind/unbind completion notification. + */ +struct drm_i915_gem_vm_bind_fence { + /** @handle: User's handle for a drm_syncobj to signal. */ + __u32 handle; + + /** @rsvd: Reserved, MBZ */ + __u32 rsvd; + + /** + * @value: A point in the timeline. + * Value must be 0 for a binary drm_syncobj. A Value of 0 for a + * timeline drm_syncobj is invalid as it turns a drm_syncobj into a + * binary one. + */ + __u64 value; +}; + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. Should some error codes be documented and has the ability to programmatically probe the alignment restrictions been considered? Currently what we have internally is that -EINVAL is returned if the sart, offset and length are not aligned. If the specified mapping already exits, we return -EEXIST. If there are conflicts in the VA range and VA range can't be reserved, then -ENOSPC is returned. I can add this documentation here. But I am worried that there will be more suggestions/feedback about error codes while reviewing the code patch series, and we have to revisit it again. That's not really a good excuse to not document. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. Text should be clear whether "not allowed" means there will be an error returned, or it will appear to work but bad things will happen. Yah, error returned, will fix. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @handle: Object handle */ + __u32 handle; + + /** @start: Virtual Address start to bind */ + __u64 start; + + /** @offset: Offset in object to bind */ + __u64 offset; + + /** @length: Length of mapping to bind */ + __u64 length; + + /** + * @flags: Supported flags are: + * + * I915_GEM_VM_BIND_FENCE_VALID: + * @fence is valid, needs bind completion notificati
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 23/06/2022 11:27, Tvrtko Ursulin wrote: After a vm_unbind, UMD can re-bind to same VA range against an active VM. Though I am not sue with Mesa usecase if that new mapping is required for running GPU job or it will be for the next submission. But ensuring the tlb flush upon unbind, KMD can ensure correctness. Isn't that their problem? If they re-bind for submitting _new_ work then they get the flush as part of batch buffer pre-amble. In the non sparse case, if a VA range is unbound, it is invalid to use that range for anything until it has been rebound by something else. We'll take the fence provided by vm_bind and put it as a wait fence on the next execbuffer. It might be safer in case of memory over fetching? TLB flush will have to happen at some point right? What's the alternative to do it in unbind? -Lionel
[Intel-gfx] [PATCH] drm/i915/dg2: Add performance workaround 18019455067
This is the recommended value for optimal performance. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 3 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 07ef111947b8c..a50b5790e434e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1112,6 +1112,9 @@ #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) #define RT_CTRL_MMIO(0xe530) +#define NUMBER_OF_STACKIDS_512 (2 << 5) +#define NUMBER_OF_STACKIDS_1024 (1 << 5) +#define NUMBER_OF_STACKIDS_2048 (0 << 5) #define DIS_NULL_QUERY REG_BIT(10) #define EU_PERF_CNTL1 _MMIO(0xe558) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 3213c593a55f4..a8a389d36986c 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2106,6 +2106,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * performance guide section. */ wa_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); + +/* Wa_18019455067:dg2 / BSpec 68331/54402 */ +wa_write_or(wal, RT_CTRL, NUMBER_OF_STACKIDS_512); } if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) { -- 2.32.0
Re: [Intel-gfx] [PATCH v2 01/12] drm/doc: add rfc section for small BAR uapi
On 21/06/2022 13:44, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. v4: - Various improvements all over. (Tvrtko) v5: - Include newer integrated platforms when applying the non-recoverable context and error capture restriction. (Thomas) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org Acked-by: Tvrtko Ursulin Acked-by: Akeem G Abodunrin With Jordan with have changes for Anv/Iris : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739 Acked-by: Lionel Landwerlin --- Documentation/gpu/rfc/i915_small_bar.h | 189 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 240 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..752bb2ceb399 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,189 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** +* @probed_size: Memory probed by the driver (-1 = unknown) +* +* Note that it should not be possible to ever encounter a zero value +* here, also note that no current region type will ever return -1 here. +* Although for future region types, this might be a possibility. The +* same applies to the other size fields. +*/ + __u64 probed_size; + + /** +* @unallocated_size: Estimate of memory remaining (-1 = unknown) +* +* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. +* Without this (or if this is an older kernel) the value here will +* always equal the @probed_size. Note this is only currently tracked +* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here +* will always equal the @probed_size). +*/ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder (if there is any) will not be CPU +* accessible. +* +* On systems without small BAR, the @probed_size will +* always equal the @probed_cpu_visible_size, since all +* of it will be CPU accessible. +* +* Note this is only tracked for +* I915_MEMORY_CLASS_DEVICE regions (for other types the +* value here will always equal the @probed_size). +* +* Note that if the value returned here is zero, then +* this must be an old kernel which lacks the relevant +* small-bar uAPI support (including +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on +* such systems we should never actually end up with a +* small BAR configuration, assuming we are able to load +* the kernel module. Hence it should be safe to treat +* this the same as when @probed_cpu_visible_size == +* @probed_size. +*/ + __u64 probed_cpu_v
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 13/06/2022 21:02, Niranjana Vishwanathapura wrote: On Mon, Jun 13, 2022 at 06:33:07AM -0700, Zeng, Oak wrote: Regards, Oak -Original Message- From: Intel-gfx On Behalf Of Niranjana Vishwanathapura Sent: June 10, 2022 1:43 PM To: Landwerlin, Lionel G Cc: Intel GFX ; Maling list - DRI developers de...@lists.freedesktop.org>; Hellstrom, Thomas ; Wilson, Chris P ; Vetter, Daniel ; Christian König Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document On Fri, Jun 10, 2022 at 11:18:14AM +0300, Lionel Landwerlin wrote: >On 10/06/2022 10:54, Niranjana Vishwanathapura wrote: >>On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote: >>>On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: >>>>On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: >>>>> On 09/06/2022 00:55, Jason Ekstrand wrote: >>>>> >>>>> On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura >>>>> wrote: >>>>> >>>>> On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: >>>>> > >>>>> > >>>>> >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>>>> >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana >>>>>Vishwanathapura >>>>> wrote: >>>>> >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason >>>>>Ekstrand wrote: >>>>> >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>>> >>>> wrote: >>>>> >>>> >>>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel >>>>>Landwerlin >>>>> wrote: >>>>> >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>>> >>>> > >>>>> >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana >>>>>Vishwanathapura >>>>> >>>> > wrote: >>>>> >>>> > >>>>> >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>> >>>>Brost wrote: >>>>> >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel >>>>> Landwerlin >>>>> >>>> wrote: >>>>> >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura >>>>> wrote: >>>>> >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>>> >>>> binding/unbinding >>>>> >>>> > the mapping in an >>>>> >>>> > >> > +async worker. The binding and >>>>>unbinding will >>>>> >>>>work like a >>>>> >>>> special >>>>> >>>> > GPU engine. >>>>> >>>> > >> > +The binding and unbinding operations are >>>>> serialized and >>>>> >>>> will >>>>> >>>> > wait on specified >>>>> >>>> > >> > +input fences before the operation >>>>>and will signal >>>>> the >>>>> >>>> output >>>>> >>>> > fences upon the >>>>> >>>> > >> > +completion of the operation. Due to >>>>> serialization, >>>>> >>>> completion of >>>>> >>>> > an operation >>>>> >>>> > >> > +will also indicate that all >>>>>previous operations >>>>> >>>>are also >>>>> >>>> > complete. >>>>> >>>> > >> >>>>> >>>> > >> I guess we should avoid saying "will >>>>>immediately >>>>> start >>>>> >>>> > binding/unbinding" if >>>>> >>>> > >> there are fences involved. >>>>> >>>> > >> >>>>> >>>> > >> And the fact that i
Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
On 10/06/2022 11:53, Matthew Brost wrote: On Fri, Jun 10, 2022 at 12:07:11AM -0700, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 490 +++ 1 file changed, 490 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..9fc854969cfb --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,490 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + * bit[0]: If set, VM_BIND is supported, otherwise not. + * bits[8-15]: VM_BIND implementation version. + * version 0 will not have VM_BIND/UNBIND timeline fence array support. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/** + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING + * + * Flag to declare context as long running. + * See struct drm_i915_gem_context_create_ext flags. + * + * Usage of dma-fence expects that they complete in reasonable amount of time. + * Compute on the other hand can be long running. Hence it is not appropriate + * for compute contexts to export request completion dma-fence to user. + * The dma-fence usage will be limited to in-kernel consumption only. + * Compute contexts need to use user/memory fence. + * + * So, long running contexts do not support output fences. Hence, + * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is expected + * to be not used. DRM_I915_GEM_WAIT ioctl call is also not supported for + * objects mapped to long running contexts. + */ +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING (1u << 2) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f +#define DRM_I915_GEM_WAIT_USER_FENCE 0x40 + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence) + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @queue_idx specifies the queue to use for binding. Same queue can be + * used for both VM_BIND and VM_UNBIND calls. All submitted bind and unbind + * operations in a queue are performed in the order of submission. + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @queue_idx: Index of queue for binding */ + __u32 queue_idx; + + /** @rsvd: Reserved, MBZ */ + __u32 rsvd; + + /** @handle: Object handle */ + __u32 handle; + + /** @start: Virtual Address start to bind */ + __u64 start; + + /** @offset: Offset in object to bind */ + __u64 offset; + + /** @length: Length of mapping to bind */ + __u64 length; This probably isn't needed. We are never going to unbind a subset of a VMA are we? That being said it can't hurt as a sanity check (e.g. internal vma->le
Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
On 10/06/2022 13:37, Tvrtko Ursulin wrote: On 10/06/2022 08:07, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 490 +++ 1 file changed, 490 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..9fc854969cfb --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,490 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + * bit[0]: If set, VM_BIND is supported, otherwise not. + * bits[8-15]: VM_BIND implementation version. + * version 0 will not have VM_BIND/UNBIND timeline fence array support. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/** + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING + * + * Flag to declare context as long running. + * See struct drm_i915_gem_context_create_ext flags. + * + * Usage of dma-fence expects that they complete in reasonable amount of time. + * Compute on the other hand can be long running. Hence it is not appropriate + * for compute contexts to export request completion dma-fence to user. + * The dma-fence usage will be limited to in-kernel consumption only. + * Compute contexts need to use user/memory fence. + * + * So, long running contexts do not support output fences. Hence, + * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is expected + * to be not used. DRM_I915_GEM_WAIT ioctl call is also not supported for + * objects mapped to long running contexts. + */ +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING (1u << 2) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f +#define DRM_I915_GEM_WAIT_USER_FENCE 0x40 + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence) + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @queue_idx specifies the queue to use for binding. Same queue can be + * used for both VM_BIND and VM_UNBIND calls. All submitted bind and unbind + * operations in a queue are performed in the order of submission. + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @queue_idx: Index of queue for binding */ + __u32 queue_idx; I have a question here to which I did not find an answer by browsing the old threads. Queue index appears to be an implicit synchronisation mechanism, right? Operations on the same index are executed/complete in order of ioctl submission? Do we _have_ to implement this on the kernel side and could just allow in/out fence and let userspace deal with it? It orders operations like in a queue. Which is kind of what happens with existing queues/engines. If I understood correctly, it's going to be a kthread + a linked list righ
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 10/06/2022 10:54, Niranjana Vishwanathapura wrote: On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote: On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> >
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> > >> >>>> > >> Because you wrote "serialized I assume it's : in order >
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> > >> >>>> > >> Because you wrote "serialized I assume it's : in order >>>> > >> >>>> > >>>> > Yes, in the order of VM_BIND/UNBIND ioctls. Note that >>>>bind and >>>> unbind >>>> > will use >>>> > the same queue and hence are ord
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 08/06/2022 11:36, Tvrtko Ursulin wrote: On 08/06/2022 07:40, Lionel Landwerlin wrote: On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future req
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requirements and pointless fields. Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the older sw on execbuf2 for ever. I guess another point in favour of execbuf3 would be that it's less midlayer. If we share the entry point then there's quite a few vfuncs needed to cleanly split out the vm_bind paths from the legacy reloc/softping paths. If we invert this and do execbuf3, then there's the existing ioctl vfunc, and then we share code (where it even makes sense, probably request setup/submit need to be shared, a
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 08/06/2022 09:40, Lionel Landwerlin wrote: On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requiremen
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requirements and pointless fields. Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the older sw on execbuf2 for ever. I guess another point in favour of execbuf3 would be that it's less midlayer. If we share the entry point then there's quite a few vfuncs needed to cleanly split out the vm_bind paths from the legacy reloc/softping paths. If we invert this and do execbuf3, then there's the existing ioctl vfunc, and then we share code (where it even makes sense, probably request setup/submit need to be shared, a
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 02/06/2022 23:35, Jason Ekstrand wrote: On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote: >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote: >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an >> > +async worker. The binding and unbinding will work like a special GPU engine. >> > +The binding and unbinding operations are serialized and will wait on specified >> > +input fences before the operation and will signal the output fences upon the >> > +completion of the operation. Due to serialization, completion of an operation >> > +will also indicate that all previous operations are also complete. >> >> I guess we should avoid saying "will immediately start binding/unbinding" if >> there are fences involved. >> >> And the fact that it's happening in an async worker seem to imply it's not >> immediate. >> Ok, will fix. This was added because in earlier design binding was deferred until next execbuff. But now it is non-deferred (immediate in that sense). But yah, this is confusing and will fix it. >> >> I have a question on the behavior of the bind operation when no input fence >> is provided. Let say I do : >> >> VM_BIND (out_fence=fence1) >> >> VM_BIND (out_fence=fence2) >> >> VM_BIND (out_fence=fence3) >> >> >> In what order are the fences going to be signaled? >> >> In the order of VM_BIND ioctls? Or out of order? >> >> Because you wrote "serialized I assume it's : in order >> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and unbind will use the same queue and hence are ordered. >> >> One thing I didn't realize is that because we only get one "VM_BIND" engine, >> there is a disconnect from the Vulkan specification. >> >> In Vulkan VM_BIND operations are serialized but per engine. >> >> So you could have something like this : >> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) >> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) >> >> >> fence1 is not signaled >> >> fence3 is signaled >> >> So the second VM_BIND will proceed before the first VM_BIND. >> >> >> I guess we can deal with that scenario in userspace by doing the wait >> ourselves in one thread per engines. >> >> But then it makes the VM_BIND input fences useless. >> >> >> Daniel : what do you think? Should be rework this or just deal with wait >> fences in userspace? >> > >My opinion is rework this but make the ordering via an engine param optional. > >e.g. A VM can be configured so all binds are ordered within the VM > >e.g. A VM can be configured so all binds accept an engine argument (in >the case of the i915 likely this is a gem context handle) and binds >ordered with respect to that engine. > >This gives UMDs options as the later likely consumes more KMD resources >so if a different UMD can live with binds being ordered within the VM >they can use a mode consuming less resources. > I think we need to be careful here if we are looking for some out of (submission) order completion of vm_bind/unbind. In-order completion means, in a batch of binds and unbinds to be completed in-order, user only needs to specify in-fence for the first bind/unbind call and the our-fence for the last bind/unbind call. Also, the VA released by an unbind call can be re-used by any subsequent bind call in that in-order batch. These things will break if binding/unbinding were to be allowed to go out of order (of submission) and user need to be extra careful not to run into pre-mature triggereing of out-fence and bind failing as VA is still in use etc. Also, VM_BIND binds the provided mapping on the specified address space (VM). So, the uapi is not engine/context specific. We can however add a 'queue' to the uapi which can be one from the pre-defined queues, I915_VM_BIND_QUEUE_0 I915_VM_BIND_QUEUE_1 ... I915_VM_BIND_QUEUE_(N-1) KMD will spawn an async work queue for each queue which will only bind
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 02/06/2022 00:18, Matthew Brost wrote: On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote: On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. I guess we should avoid saying "will immediately start binding/unbinding" if there are fences involved. And the fact that it's happening in an async worker seem to imply it's not immediate. I have a question on the behavior of the bind operation when no input fence is provided. Let say I do : VM_BIND (out_fence=fence1) VM_BIND (out_fence=fence2) VM_BIND (out_fence=fence3) In what order are the fences going to be signaled? In the order of VM_BIND ioctls? Or out of order? Because you wrote "serialized I assume it's : in order One thing I didn't realize is that because we only get one "VM_BIND" engine, there is a disconnect from the Vulkan specification. In Vulkan VM_BIND operations are serialized but per engine. So you could have something like this : VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) Question - let's say this done after the above operations: EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL) Is the exec ordered with respected to bind (i.e. would fence3 & 4 be signaled before the exec starts)? Matt Hi Matt, From the vulkan point of view, everything is serialized within an engine (we map that to a VkQueue). So with : EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) EXEC completes first then VM_BIND executes. To be even clearer : EXEC (engine=ccs0, in_fence=fence2, out_fence=NULL) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) EXEC will wait until fence2 is signaled. Once fence2 is signaled, EXEC proceeds, finishes and only after it is done, VM_BIND executes. It would kind of like having the VM_BIND operation be another batch executed from the ringbuffer buffer. -Lionel fence1 is not signaled fence3 is signaled So the second VM_BIND will proceed before the first VM_BIND. I guess we can deal with that scenario in userspace by doing the wait ourselves in one thread per engines. But then it makes the VM_BIND input fences useless. Daniel : what do you think? Should be rework this or just deal with wait fences in userspace? Sorry I noticed this late. -Lionel
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. I guess we should avoid saying "will immediately start binding/unbinding" if there are fences involved. And the fact that it's happening in an async worker seem to imply it's not immediate. I have a question on the behavior of the bind operation when no input fence is provided. Let say I do : VM_BIND (out_fence=fence1) VM_BIND (out_fence=fence2) VM_BIND (out_fence=fence3) In what order are the fences going to be signaled? In the order of VM_BIND ioctls? Or out of order? Because you wrote "serialized I assume it's : in order One thing I didn't realize is that because we only get one "VM_BIND" engine, there is a disconnect from the Vulkan specification. In Vulkan VM_BIND operations are serialized but per engine. So you could have something like this : VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) fence1 is not signaled fence3 is signaled So the second VM_BIND will proceed before the first VM_BIND. I guess we can deal with that scenario in userspace by doing the wait ourselves in one thread per engines. But then it makes the VM_BIND input fences useless. Daniel : what do you think? Should be rework this or just deal with wait fences in userspace? Sorry I noticed this late. -Lionel
Re: [Intel-gfx] [PATCH v2 2/6] drm/i915/xehp: Drop GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK
On 17/05/2022 06:20, Matt Roper wrote: Slice/subslice/EU information should be obtained via the topology queries provided by the I915_QUERY interface; let's turn off support for the old GETPARAM lookups on Xe_HP and beyond where we can't return meaningful values. The slice mask lookup is meaningless since Xe_HP doesn't support traditional slices (and we make no attempt to return the various new units like gslices, cslices, mslices, etc.) here. The subslice mask lookup is even more problematic; given the distinct masks for geometry vs compute purposes, the combined mask returned here is likely not what userspace would want to act upon anyway. The value is also limited to 32-bits by the nature of the GETPARAM ioctl which is sufficient for the initial Xe_HP platforms, but is unable to convey the larger masks that will be needed on other upcoming platforms. Finally, the value returned here becomes even less meaningful when used on multi-tile platforms where each tile will have its own masks. Signed-off-by: Matt Roper Sounds fair. We've been relying on the topology query in Mesa since it's available and it's a requirement for Gfx10+. FYI, we're also not using I915_PARAM_EU_TOTAL on Gfx10+ for the same reason. Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_getparam.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index c12a0adefda5..ac9767c56619 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -148,11 +148,19 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, value = intel_engines_has_context_isolation(i915); break; case I915_PARAM_SLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + value = sseu->slice_mask; if (!value) return -ENODEV; break; case I915_PARAM_SUBSLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + /* Only copy bits from the first slice */ memcpy(&value, sseu->subslice_mask, min(sseu->ss_stride, (u8)sizeof(value)));
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 30/05/2022 14:40, Christian König wrote: Am 30.05.22 um 12:09 schrieb Lionel Landwerlin: On 30/05/2022 12:52, Christian König wrote: Am 25.05.22 um 23:59 schrieb Lucas De Marchi: On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote: Am 25.05.22 um 11:35 schrieb Lionel Landwerlin: [SNIP] Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. Yeah, exactly that's what I couldn't understand as well. What you describe above should still work fine. Thanks for taking a look into this, Christian. With some additional prints: [ 210.742634] Console: switching to colour dummy device 80x25 [ 210.742686] [IGT] syncobj_timeline: executing [ 210.756988] [IGT] syncobj_timeline: starting subtest transfer-timeline-point [ 210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding fence0 signaled=1 [ 210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting array fence signaled=0 [ 210.800469] [IGT] syncobj_timeline: exiting, ret=98 [ 210.825426] Console: switching to colour frame buffer device 240x67 still learning this part of the code but AFAICS the problem is because when we are creating the array, the 'signaled' doesn't propagate to the array. Yeah, but that is intentionally. The array should only signal when requested. I still don't get what the test case here is checking. There must be something I don't know about fence arrays. You seem to say that creating an array of signaled fences will not make the array signaled. Exactly that, yes. The array delays it's signaling until somebody asks for it. In other words the fences inside the array are check only after someone calls dma_fence_enable_sw_signaling() which in turn calls dma_fence_array_enable_signaling(). It is certainly possible that nobody does that in the drm_syncobj and because of this the array never signals. Regards, Christian. Thanks, Yeah I guess dma_fence_enable_sw_signaling() is never called for sw_sync. Don't we also want to call it right at the end of drm_syncobj_flatten_chain() ? -Lionel This is the situation with this IGT test. We started with a syncobj with point 1 & 2 signaled. We take point 2 and import it as a new point 3 on the same syncobj. We expect point 3 to be signaled as well and it's not. Thanks, -Lionel Regards, Christian. dma_fence_array_create() { ... atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); ... } This is not considering the fact that some of the fences could already have been signaled as is the case in the igt@syncobj_timeline@transfer-timeline-point test. See https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html Quick patch on this function fixes it for me: -8< Subject: [PATCH] dma-buf: Honor already signaled fences on array creation When creating an array, array->num_pending is marked with the number of fences. However the fences could alredy have been signaled. Propagate num_pending to the array by looking at each individual fence the array contains. Signed-off-by: Lucas De Marchi --- drivers/dma-buf/dma-fence-array.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 5c8a7084577b..32f491c32fa0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -158,6 +158,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, { struct dma_fence_array *array; size_t size = sizeof(*array); + unsigned num_pending = 0; + struct dma_fence **f; WARN_ON(!num_fences || !fences); @@ -173,7 +175,14 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, init_irq_work(&array->work, irq_dma_fence_array_work); array->num_fences = num_fences; - atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); + + for (f = fences; f < fences + num_fences; f++) + num_pending += !dma_fence_is_signaled(*f); + + if (signal_on_any) + num_pending = !!num_pending; + + atomic_set(&array->num_pending, num_pending); array->fences = fences; array->base.error = PENDING_ERROR;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 30/05/2022 12:52, Christian König wrote: Am 25.05.22 um 23:59 schrieb Lucas De Marchi: On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote: Am 25.05.22 um 11:35 schrieb Lionel Landwerlin: [SNIP] Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. Yeah, exactly that's what I couldn't understand as well. What you describe above should still work fine. Thanks for taking a look into this, Christian. With some additional prints: [ 210.742634] Console: switching to colour dummy device 80x25 [ 210.742686] [IGT] syncobj_timeline: executing [ 210.756988] [IGT] syncobj_timeline: starting subtest transfer-timeline-point [ 210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding fence0 signaled=1 [ 210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting array fence signaled=0 [ 210.800469] [IGT] syncobj_timeline: exiting, ret=98 [ 210.825426] Console: switching to colour frame buffer device 240x67 still learning this part of the code but AFAICS the problem is because when we are creating the array, the 'signaled' doesn't propagate to the array. Yeah, but that is intentionally. The array should only signal when requested. I still don't get what the test case here is checking. There must be something I don't know about fence arrays. You seem to say that creating an array of signaled fences will not make the array signaled. This is the situation with this IGT test. We started with a syncobj with point 1 & 2 signaled. We take point 2 and import it as a new point 3 on the same syncobj. We expect point 3 to be signaled as well and it's not. Thanks, -Lionel Regards, Christian. dma_fence_array_create() { ... atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); ... } This is not considering the fact that some of the fences could already have been signaled as is the case in the igt@syncobj_timeline@transfer-timeline-point test. See https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html Quick patch on this function fixes it for me: -8< Subject: [PATCH] dma-buf: Honor already signaled fences on array creation When creating an array, array->num_pending is marked with the number of fences. However the fences could alredy have been signaled. Propagate num_pending to the array by looking at each individual fence the array contains. Signed-off-by: Lucas De Marchi --- drivers/dma-buf/dma-fence-array.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 5c8a7084577b..32f491c32fa0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -158,6 +158,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, { struct dma_fence_array *array; size_t size = sizeof(*array); + unsigned num_pending = 0; + struct dma_fence **f; WARN_ON(!num_fences || !fences); @@ -173,7 +175,14 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, init_irq_work(&array->work, irq_dma_fence_array_work); array->num_fences = num_fences; - atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); + + for (f = fences; f < fences + num_fences; f++) + num_pending += !dma_fence_is_signaled(*f); + + if (signal_on_any) + num_pending = !!num_pending; + + atomic_set(&array->num_pending, num_pending); array->fences = fences; array->base.error = PENDING_ERROR;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 25/05/2022 12:26, Lionel Landwerlin wrote: On 25/05/2022 11:24, Christian König wrote: Am 25.05.22 um 08:47 schrieb Lionel Landwerlin: On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" Yes, correct. We are now enforcing some rules with warnings and this here bubbled up. But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. Hui what? I don't understand the problem you are seeing here. What exactly is the test doing? In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. Scratching my head what you mean with that. The functionality transfers a synchronization fence from one timeline to another. So as far as I can see the new point should be part of the timeline of the syncobj we are transferring to. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Well we should never lose synchronization. So what happens is that when we do the transfer all the fences of the source are flattened out into an array. And that array is then added as new point into the destination timeline. In this case would be broken : syncobjA <- signal point 1 syncobjA <- import syncobjB point 1 into syncobjA point 2 syncobjA <- query returns 0 -Lionel Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. -Lionel Where exactly is the problem? Regards, Christian. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 25/05/2022 11:24, Christian König wrote: Am 25.05.22 um 08:47 schrieb Lionel Landwerlin: On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" Yes, correct. We are now enforcing some rules with warnings and this here bubbled up. But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. Hui what? I don't understand the problem you are seeing here. What exactly is the test doing? In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. Scratching my head what you mean with that. The functionality transfers a synchronization fence from one timeline to another. So as far as I can see the new point should be part of the timeline of the syncobj we are transferring to. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Well we should never lose synchronization. So what happens is that when we do the transfer all the fences of the source are flattened out into an array. And that array is then added as new point into the destination timeline. In this case would be broken : syncobjA <- signal point 1 syncobjA <- import syncobjB point 1 into syncobjA point 2 syncobjA <- query returns 0 -Lionel Where exactly is the problem? Regards, Christian. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 20/05/2022 01:52, Zanoni, Paulo R wrote: On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: VM_BIND design document with description of intended use cases. v2: Add more documentation and format as per review comments from Daniel. Signed-off-by: Niranjana Vishwanathapura --- diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst new file mode 100644 index ..f1be560d313c --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.rst @@ -0,0 +1,304 @@ +== +I915 VM_BIND feature design and use cases +== + +VM_BIND feature + +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a +specified address space (VM). These mappings (also referred to as persistent +mappings) will be persistent across multiple GPU submissions (execbuff calls) +issued by the UMD, without user having to provide a list of all required +mappings during each submission (as required by older execbuff mode). + +VM_BIND/UNBIND ioctls will support 'in' and 'out' fences to allow userpace +to specify how the binding/unbinding should sync with other operations +like the GPU job submission. These fences will be timeline 'drm_syncobj's +for non-Compute contexts (See struct drm_i915_vm_bind_ext_timeline_fences). +For Compute contexts, they will be user/memory fences (See struct +drm_i915_vm_bind_ext_user_fence). + +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND. +User has to opt-in for VM_BIND mode of binding for an address space (VM) +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension. + +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. + +VM_BIND features include: + +* Multiple Virtual Address (VA) mappings can map to the same physical pages + of an object (aliasing). +* VA mapping can map to a partial section of the BO (partial binding). +* Support capture of persistent mappings in the dump upon GPU error. +* TLB is flushed upon unbind completion. Batching of TLB flushes in some + use cases will be helpful. +* Asynchronous vm_bind and vm_unbind support with 'in' and 'out' fences. +* Support for userptr gem objects (no special uapi is required for this). + +Execbuff ioctl in VM_BIND mode +--- +The execbuff ioctl handling in VM_BIND mode differs significantly from the +older method. A VM in VM_BIND mode will not support older execbuff mode of +binding. In VM_BIND mode, execbuff ioctl will not accept any execlist. Hence, +no support for implicit sync. It is expected that the below work will be able +to support requirements of object dependency setting in all use cases: + +"dma-buf: Add an API for exporting sync files" +(https://lwn.net/Articles/859290/) I would really like to have more details here. The link provided points to new ioctls and we're not very familiar with those yet, so I think you should really clarify the interaction between the new additions here. Having some sample code would be really nice too. For Mesa at least (and I believe for the other drivers too) we always have a few exported buffers in every execbuf call, and we rely on the implicit synchronization provided by execbuf to make sure everything works. The execbuf ioctl also has some code to flush caches during implicit synchronization AFAIR, so I would guess we rely on it too and whatever else the Kernel does. Is that covered by the new ioctls? In addition, as far as I remember, one of the big improvements of vm_bind was that it would help reduce ioctl latency and cpu overhead. But if making execbuf faster comes at the cost of requiring additional ioctls calls for implicit synchronization, which is required on ever execbuf call, then I wonder if we'll even get any faster at all. Comparing old execbuf vs plain new execbuf without the new required ioctls won't make sense. But maybe I'm wrong and we won't need to call these new ioctls around every single execbuf ioctl we submit? Again, more clarification and some code examples here would be really nice. This is a big change on an important part of the API, we should clarify the new expected usage. Hey Paulo, I think in the case of X11/Wayland, we'll be doing 1 or 2 extra ioctls per frame which seems pretty reasonable. Essentially we need to set the dependencies on the buffer we´re going to tell the display engine (gnome-shell/kde/bare-display-hw) to use. In the Vulkan case, we're t
Re: [Intel-gfx] [PATCH v3] drm/doc: add rfc section for small BAR uapi
On 17/05/2022 12:23, Tvrtko Ursulin wrote: On 17/05/2022 09:55, Lionel Landwerlin wrote: On 17/05/2022 11:29, Tvrtko Ursulin wrote: On 16/05/2022 19:11, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jon Bloomfield Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 164 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 215 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..4079d287750b --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,164 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; Is -1 possible today or when it will be? For system memory it appears zeroes are returned today so that has to stay I think. Does it effectively mean userspace has to consider both 0 and -1 as unknown is the question. I raised this on v2. As far as I can tell there are no situation where we would get -1. Is it really probed_size=0 on smem?? It's not the case on the internal branch. My bad, I misread the arguments to intel_memory_region_create while grepping: struct intel_memory_region *i915_gem_shmem_setup(struct drm_i915_private *i915, u16 type, u16 instance) { return intel_memory_region_create(i915, 0, totalram_pages() << PAGE_SHIFT, PAGE_SIZE, 0, 0, type, instance, &shmem_region_ops); I saw "0, 0" and wrongly assumed that would be the data, since it matched with my mental model and the comment against unallocated_size saying it's only tracked for device memory. Although I'd say it is questionable for i915 to return this data. I wonder it use case is possible where it would even be wrong but don't know. I guess the cat is out of the bag now. Not sure how questionable that is. There are a bunch of tools reporting the amount of memory available (free, top, htop, etc...). It might not be totalram_pages() but probably something close to it. Having a non 0 & non -1 value is useful. -Lionel If the situation is -1 for unknown and some valid size (not zero) I don't think there is a problem here. Regards, Tvrtko Anv is not currently handling that case. I would very much like to not deal with 0 for smem. It really makes it easier for userspace rather than having to fish information from 2 different places and on top of dealing with multiple kernel versions. -Lionel + + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE + * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get + * reliable accounting. Without this(or if this an older kernel) the s/if this an/if this is an/ Also same question as above about -1. + * value here will always match the @probed_size. + */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). Also question about -1. In this case this could be done since the field is yet to be added but I am curious if it ever can be -1. + * + * This will be always be <= @probed_s
Re: [Intel-gfx] [PATCH v3] drm/doc: add rfc section for small BAR uapi
On 17/05/2022 11:29, Tvrtko Ursulin wrote: On 16/05/2022 19:11, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jon Bloomfield Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 164 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 215 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..4079d287750b --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,164 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; Is -1 possible today or when it will be? For system memory it appears zeroes are returned today so that has to stay I think. Does it effectively mean userspace has to consider both 0 and -1 as unknown is the question. I raised this on v2. As far as I can tell there are no situation where we would get -1. Is it really probed_size=0 on smem?? It's not the case on the internal branch. Anv is not currently handling that case. I would very much like to not deal with 0 for smem. It really makes it easier for userspace rather than having to fish information from 2 different places and on top of dealing with multiple kernel versions. -Lionel + + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE + * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get + * reliable accounting. Without this(or if this an older kernel) the s/if this an/if this is an/ Also same question as above about -1. + * value here will always match the @probed_size. + */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). Also question about -1. In this case this could be done since the field is yet to be added but I am curious if it ever can be -1. + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + * + * On systems without small BAR, the @probed_size will + * always equal the @probed_cpu_visible_size, since all + * of it will be CPU accessible. + * + * Note that if the value returned here is zero, then + * this must be an old kernel which lacks the relevant + * small-bar uAPI support(including I have noticed you prefer no space before parentheses throughout the text so I guess it's just my preference to have it. Very nitpicky even if I am right so up to you. + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on + * such systems we should never actually end up with a + * small BAR configuration, assuming we are able to load + * the kernel module. Hence it should be safe to treat + * this the same as when @probed_cpu_visible_size == + * @probed_size. + */ + __u64 probed_cpu_visible_size; + + /** + * @unallocated_cpu_visible_size: Estimate of CPU + * visible memory remaining (-1 = unknown). + * + * Note this is only
Re: [Intel-gfx] [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj
On 14/05/2022 00:06, Jordan Justen wrote: On 2022-05-13 05:31:00, Lionel Landwerlin wrote: On 02/05/2022 17:15, Ramalingam C wrote: Capture the impact of memory region preference list of the objects, on their memory residency and Flat-CCS capability. v2: Fix the Flat-CCS capability of an obj with {lmem, smem} preference list [Thomas] v3: Reworded the doc [Matt] Signed-off-by: Ramalingam C cc: Matthew Auld cc: Thomas Hellstrom cc: Daniel Vetter cc: Jon Bloomfield cc: Lionel Landwerlin cc: Kenneth Graunke cc:mesa-...@lists.freedesktop.org cc: Jordan Justen cc: Tony Ye Reviewed-by: Matthew Auld --- include/uapi/drm/i915_drm.h | 16 1 file changed, 16 insertions(+) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a2def7b27009..b7e1c2fe08dc 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext { * At which point we get the object handle in &drm_i915_gem_create_ext.handle, * along with the final object size in &drm_i915_gem_create_ext.size, which * should account for any rounding up, if required. + * + * Note that userspace has no means of knowing the current backing region + * for objects where @num_regions is larger than one. The kernel will only + * ensure that the priority order of the @regions array is honoured, either + * when initially placing the object, or when moving memory around due to + * memory pressure + * + * On Flat-CCS capable HW, compression is supported for the objects residing + * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other + * memory class in @regions and migrated (by I915, due to memory + * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to + * decompress the content. But I915 dosen't have the required information to + * decompress the userspace compressed objects. + * + * So I915 supports Flat-CCS, only on the objects which can reside only on + * I915_MEMORY_CLASS_DEVICE regions. I think it's fine to assume Flat-CSS surface will always be in lmem. I see no issue for the Anv Vulkan driver. Maybe Nanley or Ken can speak for the Iris GL driver? Acked-by: Jordan Justen I think Nanley has accounted for this on iris with: https://gitlab.freedesktop.org/mesa/mesa/-/commit/42a865730ef72574e179b56a314f30fdccc6cba8 -Jordan Thanks Jordan, We might want to through in an additional : assert((|flags &||BO_ALLOC_SMEM) == 0); in the CCS case | | | |-Lionel |
Re: [Intel-gfx] [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj
On 02/05/2022 17:15, Ramalingam C wrote: Capture the impact of memory region preference list of the objects, on their memory residency and Flat-CCS capability. v2: Fix the Flat-CCS capability of an obj with {lmem, smem} preference list [Thomas] v3: Reworded the doc [Matt] Signed-off-by: Ramalingam C cc: Matthew Auld cc: Thomas Hellstrom cc: Daniel Vetter cc: Jon Bloomfield cc: Lionel Landwerlin cc: Kenneth Graunke cc: mesa-...@lists.freedesktop.org cc: Jordan Justen cc: Tony Ye Reviewed-by: Matthew Auld --- include/uapi/drm/i915_drm.h | 16 1 file changed, 16 insertions(+) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a2def7b27009..b7e1c2fe08dc 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext { * At which point we get the object handle in &drm_i915_gem_create_ext.handle, * along with the final object size in &drm_i915_gem_create_ext.size, which * should account for any rounding up, if required. + * + * Note that userspace has no means of knowing the current backing region + * for objects where @num_regions is larger than one. The kernel will only + * ensure that the priority order of the @regions array is honoured, either + * when initially placing the object, or when moving memory around due to + * memory pressure + * + * On Flat-CCS capable HW, compression is supported for the objects residing + * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other + * memory class in @regions and migrated (by I915, due to memory + * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to + * decompress the content. But I915 dosen't have the required information to + * decompress the userspace compressed objects. + * + * So I915 supports Flat-CCS, only on the objects which can reside only on + * I915_MEMORY_CLASS_DEVICE regions. I think it's fine to assume Flat-CSS surface will always be in lmem. I see no issue for the Anv Vulkan driver. Maybe Nanley or Ken can speak for the Iris GL driver? -Lionel */ struct drm_i915_gem_create_ext_memory_regions { /** @base: Extension link. See struct i915_user_extension. */
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 17:27, Matthew Auld wrote: On 03/05/2022 11:39, Lionel Landwerlin wrote: On 03/05/2022 13:22, Matthew Auld wrote: On 02/05/2022 09:53, Lionel Landwerlin wrote: On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. unallocated_size should always give the same value as probed_size. We have the avail tracking, but we don't currently expose that through unallocated_size, due to lack of real userspace/user etc. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Yeah, if we ever properly expose the unallocated_size, then we could also just add unallocated_cpu_visible_size. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? I don't think it's out of the question... I guess user-space should be able to get the current flag behaviour just by specifying: device, system. And it does give more flexibly to allow something like: device, device-nm, smem. We can also drop the probed_cpu_visible_size, which would now just be the probed_size with device/device-nm. And if we lack device-nm, then the entire thing must be CPU mappable. One of the downsides though, is that we can no longer easily mix object pages from both device + device-nm, which we could previously do when we didn't specify the flag. At least according to the current design/behaviour for @regions that would not be allowed. I guess some kind of new flag like ALLOC_MIXED or so? Although currently that is only possible with device + device-nm in ttm/i915. Thanks, I wasn't aware of the restrictions. Adding unallocated_cpu_visible_size would be great. So do we want this in the next version? i.e we already have a current real use case in mind for unallocated_size where probed_size is not good enough? Yeah in the next iteration. We're using unallocated_size to implement VK_EXT_memory_budget and since I'm going to expose lmem mappable/unmappable as 2 different heaps on Vulkan, I would use that there too. -Lionel -Lionel -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 13:22, Matthew Auld wrote: On 02/05/2022 09:53, Lionel Landwerlin wrote: On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. unallocated_size should always give the same value as probed_size. We have the avail tracking, but we don't currently expose that through unallocated_size, due to lack of real userspace/user etc. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Yeah, if we ever properly expose the unallocated_size, then we could also just add unallocated_cpu_visible_size. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? I don't think it's out of the question... I guess user-space should be able to get the current flag behaviour just by specifying: device, system. And it does give more flexibly to allow something like: device, device-nm, smem. We can also drop the probed_cpu_visible_size, which would now just be the probed_size with device/device-nm. And if we lack device-nm, then the entire thing must be CPU mappable. One of the downsides though, is that we can no longer easily mix object pages from both device + device-nm, which we could previously do when we didn't specify the flag. At least according to the current design/behaviour for @regions that would not be allowed. I guess some kind of new flag like ALLOC_MIXED or so? Although currently that is only possible with device + device-nm in ttm/i915. Thanks, I wasn't aware of the restrictions. Adding unallocated_cpu_visible_size would be great. -Lionel -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested siz
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 12:07, Matthew Auld wrote: On 02/05/2022 19:03, Lionel Landwerlin wrote: On 02/05/2022 20:58, Abodunrin, Akeem G wrote: -Original Message- From: Landwerlin, Lionel G Sent: Monday, May 2, 2022 12:55 AM To: Auld, Matthew ; intel-gfx@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org; Thomas Hellström ; Bloomfield, Jon ; Daniel Vetter ; Justen, Jordan L ; Kenneth Graunke ; Abodunrin, Akeem G ; mesa-...@lists.freedesktop.org Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as +known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id +DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I believe it is set to -1 if it is unknown, and/or not cpu accessible... Cheers! ~Akeem So what should I expect on system memory? I guess just probed_cpu_visible_size == probed_size. Or maybe we can just use -1 here? What value is returned when all of probed_size is CPU visible on local memory? probed_size == probed_cpu_visible_size. Thanks, looks good to me. Then maybe we should update the comment to say that. Looks like there are no cases where we'll get -1. -Lionel Thanks, -Lionel I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, +with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the +stuff that + * is immutable. Previously we would have two ioctls, one to create +the object + * with gem_create, and another to apply various parameters, however +this + * creates some ambiguity for the params which are considered +immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are no
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 02/05/2022 20:58, Abodunrin, Akeem G wrote: -Original Message- From: Landwerlin, Lionel G Sent: Monday, May 2, 2022 12:55 AM To: Auld, Matthew ; intel-gfx@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org; Thomas Hellström ; Bloomfield, Jon ; Daniel Vetter ; Justen, Jordan L ; Kenneth Graunke ; Abodunrin, Akeem G ; mesa-...@lists.freedesktop.org Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as +known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id +DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I believe it is set to -1 if it is unknown, and/or not cpu accessible... Cheers! ~Akeem So what should I expect on system memory? What value is returned when all of probed_size is CPU visible on local memory? Thanks, -Lionel I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, +with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the +stuff that + * is immutable. Previously we would have two ioctls, one to create +the object + * with gem_create, and another to apply various parameters, however +this + * creates some ambiguity for the params which are considered +immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of t
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that since th
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that since the kernel only supports flat-CCS on objects that can +* *only* be placed in I915_MEMORY_C
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 27/04/2022 18:18, Matthew Auld wrote: On 27/04/2022 07:48, Lionel Landwerlin wrote: One question though, how do we detect that this flag (I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel? I assume older kernels are going to reject object creation if we use this flag? From some offline discussion with Lionel, the plan here is to just do a dummy gem_create_ext to check if the kernel throws an error with the new flag or not. I didn't plan to use __drm_i915_query_vma_info, but isn't it inconsistent to select the placement on the GEM object and then query whether it's mappable by address? You made a comment stating this is racy, wouldn't querying on the GEM object prevent this? Since mesa at this time doesn't currently have a use for this one, then I guess we should maybe just drop this part of the uapi, in this version at least, if no objections. Just repeating what we discussed (maybe I missed some other discussion and that's why I was confused) : The way I was planning to use this is to have 3 heaps in Vulkan : - heap0: local only, no cpu visible - heap1: system, cpu visible - heap2: local & cpu visible With heap2 having the reported probed_cpu_visible_size size. It is an error for the application to map from heap0 [1]. With that said, it means if we created a GEM BO without I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS, we'll never mmap it. So why the query? I guess it would be useful when we import a buffer from another application. But in that case, why not have the query on the BO? -Lionel [1] : https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkMapMemory.html (VUID-vkMapMemory-memory-00682) Thanks, -Lionel On 27/04/2022 09:35, Lionel Landwerlin wrote: Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * +
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
One question though, how do we detect that this flag (I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel? I assume older kernels are going to reject object creation if we use this flag? I didn't plan to use __drm_i915_query_vma_info, but isn't it inconsistent to select the placement on the GEM object and then query whether it's mappable by address? You made a comment stating this is racy, wouldn't querying on the GEM object prevent this? Thanks, -Lionel On 27/04/2022 09:35, Lionel Landwerlin wrote: Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that since the kernel only supports fla
Re: [Intel-gfx] [PATCH v2] drm/doc: add rfc section for small BAR uapi
Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that since the kernel only supports flat-CCS on objects that can +* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't +* support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with +* flat-CCS. +* +* Without this hint, the kernel will assume that non-mappable +* I915_MEMORY_CLASS_DE
Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi
Hey Matthew, all, This sounds like a good thing to have. There are a number of DG2 machines where we have a small BAR and this is causing more apps to fail. Anv currently reports 3 memory heaps to the app : - local device only (not host visible) -> mapped to lmem - device/cpu -> mapped to smem - local device but also host visible -> mapped to lmem So we could use this straight away, by just not putting the I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the first heap. One thing I don't see in this proposal is how can we get the size of the 2 lmem heap : cpu visible, cpu not visible We could use that to report the appropriate size to the app. We probably want to report a new drm_i915_memory_region_info and either : - put one of the reserve field to use to indicate : cpu visible - or define a new enum value in drm_i915_gem_memory_class Cheers, -Lionel On 18/02/2022 13:22, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 153 +++ Documentation/gpu/rfc/i915_small_bar.rst | 40 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 197 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..fa65835fd608 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,153 @@ +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that in the future we want to have our buffer flags here, at least for + * the stuff that is immutable. Previously we would have two ioctls, one to + * create the object with gem_create, and another to apply various parameters, + * however this creates some ambiguity for the params which are considered + * immutable. Also in general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE, +* will need to enable this hint, if the object can also be placed in +* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will +* throw an error otherwise. This also means that such objects will need +* I915_MEMORY_CLASS_SYSTEM set as a possible placement. +* +* Without this hint, the kernel will assume that non-mappable +* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the +* kernel can still migrate the object to the mappable part, as a last +* resort, if userspace ever CPU faults this object, but this might be +* expensive, and so ideally should be avoided. +*/ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) + __u32 flags; + /** +* @extensions: The chain of extensions to apply to this object. +* +* This will be useful in the future when we need to support several +* different extensions, and we need to apply more than one when +* creating the object. See struct i915_user_extension. +* +* If we d
Re: [Intel-gfx] [PATCH v4 12/16] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 09/12/2021 17:45, Ramalingam C wrote: From: Mika Kahola DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color rendering. Signed-off-by: Mika Kahola cc: Anshuman Gupta Signed-off-by: Juha-Pekka Heikkilä Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/display/intel_fb.c| 8 drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 - include/uapi/drm/drm_fourcc.h | 8 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index e15216f1cb82..f10e77cb5b4a 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 14 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC, + }, { + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC, + .display_ver = { 13, 14 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC, + + .ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 14 }, @@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default: diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index d80424194c75..9a89df9c0243 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: + return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2337,10 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) { - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE | + PLANE_CTL_CLEAR_COLOR_DISABLE; + + if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; + else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else { diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 51fdda26844a..b155f69f2344 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -598,6 +598,14 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) My colleague Nanley (Cc) had some requests for clarifications on this new modifier. In particular in which plane is the clear color located. I guess it wouldn't hurt to also state for each of the new modifiers defined in this series, how many planes and what data they contain. Thanks, -Lionel +/* + * Intel color control surfaces (CCS) for DG2 clear color render compression. + * + * DG2 uses a unified compression format for clear color render compression. + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. + */ +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12) + /* * Tiled, NV12MT, grouped in 64 (pi
Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/uapi: Add query for hwconfig table
On 04/11/2021 01:49, John Harrison wrote: On 11/3/2021 14:38, Jordan Justen wrote: John Harrison writes: On 11/1/2021 08:39, Jordan Justen wrote: writes: From: Rodrigo Vivi GuC contains a consolidated table with a bunch of information about the current device. Previously, this information was spread and hardcoded to all the components including GuC, i915 and various UMDs. The goal here is to consolidate the data into GuC in a way that all interested components can grab the very latest and synchronized information using a simple query. As per most of the other queries, this one can be called twice. Once with item.length=0 to determine the exact buffer size, then allocate the user memory and call it again for to retrieve the table data. For example: struct drm_i915_query_item item = { .query_id = DRM_I915_QUERY_HWCONCFIG_TABLE; }; query.items_ptr = (int64_t) &item; query.num_items = 1; ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query)); if (item.length <= 0) return -ENOENT; data = malloc(item.length); item.data_ptr = (int64_t) &data; ioctl(fd, DRM_IOCTL_I915_QUERY, query, sizeof(query)); // Parse the data as appropriate... The returned array is a simple and flexible KLV (Key/Length/Value) formatted table. For example, it could be just: enum device_attr { ATTR_SOME_VALUE = 0, ATTR_SOME_MASK = 1, }; static const u32 hwconfig[] = { ATTR_SOME_VALUE, 1, // Value Length in DWords 8, // Value ATTR_SOME_MASK, 3, 0x00, 0x, 0xFF00, }; Seems simple enough, so why doesn't i915 define the format of the returned hwconfig blob in i915_drm.h? Because the definition is nothing to do with i915. This table comes from the hardware spec. It is not defined by the KMD and it is not currently used by the KMD. So there is no reason for the KMD to be creating structures for it in the same way that the KMD does not document, define, struct, etc. every other feature of the hardware that the UMDs might use. So, i915 wants to wash it's hands completely of the format? There is obviously a difference between hardware features and a blob coming from closed source software. (Which i915 just happens to be passing along.) The hardware is a lot more difficult to change... Actually, no. The table is not "coming from closed source software". The table is defined by hardware specs. It is a table of hardware specific values. It is not being invented by the GuC just for fun or as a way to subvert the universe into the realms of closed source software. As per KMD, GuC is merely passing the table through. The table is only supported on newer hardware platforms and all GuC does is provide a mechanism for the KMD to retrieve it because the KMD cannot access it directly. The table contents are defined by hardware architects same as all the other aspects of the hardware. It seems like these details should be dropped from the i915 patch commit message since i915 wants nothing to do with it. Sure. Can remove comments. I would think it'd be preferable for i915 to stand behind the basic blob format as is (even if the keys/values can't be defined), and make a new query item if the closed source software changes the format. Close source software is not allowed to change the format because closed source software has no say in defining the format. The format is officially defined as being fixed in the spec. New key values can be added to the key enumeration but existing values cannot be deprecated and re-purposed. The table must be stable across all OSs and all platforms. No software can arbitrarily decide to change it. Of course, it'd be even better if i915 could define some keys/values as well. (Or if a spec could be released to help document / tie down the format.) See the corresponding IGT test that details all the currently defined keys. struct drm_i915_hwconfig { uint32_t key; uint32_t length; uint32_t values[]; }; It sounds like the kernel depends on the closed source guc being loaded to return this information. Is that right? Will i915 also become dependent on some of this data such that it won't be able to initialize without the firmware being loaded? At the moment, the KMD does not use the table at all. We merely provide a mechanism for the UMDs to retrieve it from the hardware. In terms of future direction, that is something you need to take up with the hardware architects. Why do you keep saying hardware, when only software is involved? See above - because the table is defined by hardware. No software, closed or open, has any say in the specification of the table. The values in the table might be defined by hardware, but the table itself definitely isn't. Like Jordan tried to explain, because this is a software interface, it changes more often than HW. Now testing doesn't just invol
Re: [Intel-gfx] [PATCH] DRM: i915: i915_perf: Fixed compiler warning
On 09/08/2021 05:33, Julius Victorian wrote: From: Julius Fixed compiler warning: "left shift of negative value" Signed-off-by: Julius Victorian Reviewed-by: Lionel Landwerlin Thanks! --- drivers/gpu/drm/i915/i915_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 9f94914958c3..7b852974241e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2804,7 +2804,7 @@ get_default_sseu_config(struct intel_sseu *out_sseu, * all available subslices per slice. */ out_sseu->subslice_mask = - ~(~0 << (hweight8(out_sseu->subslice_mask) / 2)); + ~(~0U << (hweight8(out_sseu->subslice_mask) / 2)); out_sseu->slice_mask = 0x1; } }
Re: [Intel-gfx] [PATCH 31/53] drm/i915/dg2: Report INSTDONE_GEOM values in error state
On 01/07/2021 23:24, Matt Roper wrote: Xe_HPG adds some additional INSTDONE_GEOM debug registers; the Mesa team has indicated that having these reported in the error state would be useful for debugging GPU hangs. These registers are replicated per-DSS with gslice steering. Cc: Lionel Landwerlin Signed-off-by: Matt Roper Thanks, Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 7 +++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 +++ drivers/gpu/drm/i915/i915_gpu_error.c| 10 -- drivers/gpu/drm/i915/i915_reg.h | 1 + 4 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e1302e9c168b..b3c002e4ae9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1220,6 +1220,13 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine, GEN7_ROW_INSTDONE); } } + + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) { + for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice) + instdone->geom_svg[slice][subslice] = + read_subslice_reg(engine, slice, subslice, + XEHPG_INSTDONE_GEOM_SVG); + } } else if (GRAPHICS_VER(i915) >= 7) { instdone->instdone = intel_uncore_read(uncore, RING_INSTDONE(mmio_base)); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e917b7519f2b..93609d797ac2 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -80,6 +80,9 @@ struct intel_instdone { u32 slice_common_extra[2]; u32 sampler[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; u32 row[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; + + /* Added in XeHPG */ + u32 geom_svg[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; }; /* diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index c1e744b5ab47..4de7edc451ef 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -431,6 +431,7 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, const struct sseu_dev_info *sseu = &ee->engine->gt->info.sseu; int slice; int subslice; + int iter; err_printf(m, " INSTDONE: 0x%08x\n", ee->instdone.instdone); @@ -445,8 +446,6 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, return; if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 50)) { - int iter; - for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, subslice) err_printf(m, " SAMPLER_INSTDONE[%d][%d]: 0x%08x\n", slice, subslice, @@ -471,6 +470,13 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, if (GRAPHICS_VER(m->i915) < 12) return; + if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 55)) { + for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, subslice) + err_printf(m, " GEOM_SVGUNIT_INSTDONE[%d][%d]: 0x%08x\n", + slice, subslice, + ee->instdone.geom_svg[slice][subslice]); + } + err_printf(m, " SC_INSTDONE_EXTRA: 0x%08x\n", ee->instdone.slice_common_extra[0]); err_printf(m, " SC_INSTDONE_EXTRA2: 0x%08x\n", diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 35a42df1f2aa..d58864c7adc6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2686,6 +2686,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN12_SC_INSTDONE_EXTRA2 _MMIO(0x7108) #define GEN7_SAMPLER_INSTDONE _MMIO(0xe160) #define GEN7_ROW_INSTDONE _MMIO(0xe164) +#define XEHPG_INSTDONE_GEOM_SVG_MMIO(0x666c) #define MCFG_MCR_SELECTOR _MMIO(0xfd0) #define SF_MCR_SELECTOR _MMIO(0xfd8) #define GEN8_MCR_SELECTOR _MMIO(0xfdc) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count
On 10/06/2021 23:46, john.c.harri...@intel.com wrote: From: John Harrison Various UMDs need to know the L3 bank count. So add a query API for it. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/i915_query.c | 22 ++ drivers/gpu/drm/i915/i915_reg.h| 1 + include/uapi/drm/i915_drm.h| 1 + 5 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 2161bf01ef8b..708bb3581d83 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info, intel_sseu_dump(&info->sseu, p); } + +int intel_gt_get_l3bank_count(struct intel_gt *gt) +{ + struct drm_i915_private *i915 = gt->i915; + intel_wakeref_t wakeref; + u32 fuse3; + + if (GRAPHICS_VER(i915) < 12) + return -ENODEV; + + with_intel_runtime_pm(gt->uncore->rpm, wakeref) + fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3); + + return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3)); +} diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 7ec395cace69..46aa1cf4cf30 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt) void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); +int intel_gt_get_l3bank_count(struct intel_gt *gt); void intel_gt_watchdog_work(struct work_struct *work); diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 96bd8fb3e895..0e92bb2d21b2 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -10,6 +10,7 @@ #include "i915_perf.h" #include "i915_query.h" #include +#include "gt/intel_gt.h" static int copy_query_item(void *query_hdr, size_t query_sz, u32 total_length, @@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private *i915, return hwconfig->size; } +static int query_l3banks(struct drm_i915_private *i915, +struct drm_i915_query_item *query_item) +{ + u32 banks; + + if (query_item->length == 0) + return sizeof(banks); + + if (query_item->length < sizeof(banks)) + return -EINVAL; + + banks = intel_gt_get_l3bank_count(&i915->gt); + + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr), +&banks, sizeof(banks))) + return -EFAULT; + + return sizeof(banks); +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, @@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, query_perf_config, query_memregion_info, query_hwconfig_table, + query_l3banks, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index eb13c601d680..e9ba88fe3db7 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN10_MIRROR_FUSE3 _MMIO(0x9118) #define GEN10_L3BANK_PAIR_COUNT 4 #define GEN10_L3BANK_MASK 0x0F +#define GEN12_GT_L3_MODE_MASK 0xFF #define GEN8_EU_DISABLE0 _MMIO(0x9134) #define GEN8_EU_DIS0_S0_MASK0xff diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 87d369cae22a..20d18cca5066 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2234,6 +2234,7 @@ struct drm_i915_query_item { #define DRM_I915_QUERY_PERF_CONFIG 3 #define DRM_I915_QUERY_MEMORY_REGIONS 4 #define DRM_I915_QUERY_HWCONFIG_TABLE 5 +#define DRM_I915_QUERY_L3_BANK_COUNT6 A little bit of documentation about the format of the return data would be nice :) -Lionel /* Must be kept compact -- no holes and well documented */ /** ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v2] lib/i915/perf: Fix non-card0 processing
On 05/05/2021 15:41, Janusz Krzysztofik wrote: IGT i915/perf library functions now always operate on sysfs perf attributes of card0 device node, no matter which DRM device fd a user passes. The intention was to always switch to primary device node if a user passes a render device node fd, but that breaks handling of non-card0 devices. If a user passed a render device node fd, find a primary device node of the same device and use it instead of forcibly using the primary device with minor number 0 when opening the device sysfs area. v2: Don't assume primary minor matches render minor with masked type. Signed-off-by: Janusz Krzysztofik Cc: Lionel Landwerlin --- lib/i915/perf.c | 31 --- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/lib/i915/perf.c b/lib/i915/perf.c index 56d5c0b3a..d7768468e 100644 --- a/lib/i915/perf.c +++ b/lib/i915/perf.c @@ -372,14 +372,39 @@ open_master_sysfs_dir(int drm_fd) { char path[128]; struct stat st; + int sysfs; if (fstat(drm_fd, &st) || !S_ISCHR(st.st_mode)) return -1; -snprintf(path, sizeof(path), "/sys/dev/char/%d:0", - major(st.st_rdev)); + snprintf(path, sizeof(path), "/sys/dev/char/%d:%d", major(st.st_rdev), minor(st.st_rdev)); + sysfs = open(path, O_DIRECTORY); Just to spell out the error paths : if (sysfs < 0) return sysfs; - return open(path, O_DIRECTORY); + if (sysfs >= 0 && minor(st.st_rdev) >= 128) { Then just if (minor(st.st_rdev) >= 128) { ... Maybe add a comment above this is : /* If we were given a renderD* drm_fd, find it's associated cardX node. */ + char device[100], cmp[100]; + int device_len, cmp_len, i; + + device_len = readlinkat(sysfs, "device", device, sizeof(device)); + close(sysfs); + if (device_len < 0) + return device_len; + + for (i = 0; i < 128; i++) { + + snprintf(path, sizeof(path), "/sys/dev/char/%d:%d", major(st.st_rdev), i); + sysfs = open(path, O_DIRECTORY); + if (sysfs < 0) + continue; + + cmp_len = readlinkat(sysfs, "device", cmp, sizeof(cmp)); + if (cmp_len == device_len && !memcmp(cmp, device, cmp_len)) + break; + + close(sysfs); You might want to set sysfs = -1 here just in the unlikely case this is never found. + } + } + + return sysfs; } struct intel_perf * With the proposed changes : Reviewed-by: Lionel Landwerlin ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC PATCH i-g-t] lib/i915/perf: Fix non-card0 processing
On 30/04/2021 19:18, Janusz Krzysztofik wrote: IGT i915/perf library functions now always operate on sysfs perf attributes of card0 device node, no matter which DRM device fd a user passes. The intention was to always switch to primary device node if a user passes a render device node fd, but that breaks handling of non-card0 devices. Instead of forcibly using DRM device minor number 0 when opening a device sysfs area, convert device minor number of a user passed device fd to the minor number of respective primary (cardX) device node. Signed-off-by: Janusz Krzysztofik --- lib/i915/perf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/i915/perf.c b/lib/i915/perf.c index 56d5c0b3a..336824df7 100644 --- a/lib/i915/perf.c +++ b/lib/i915/perf.c @@ -376,8 +376,8 @@ open_master_sysfs_dir(int drm_fd) if (fstat(drm_fd, &st) || !S_ISCHR(st.st_mode)) return -1; -snprintf(path, sizeof(path), "/sys/dev/char/%d:0", - major(st.st_rdev)); +snprintf(path, sizeof(path), "/sys/dev/char/%d:%d", + major(st.st_rdev), minor(st.st_rdev) & ~128); Isn't it minor(st.st_rdev) & 0xff ? or even 0x3f ? Looks like /dev/dri/controlD64 can exist too. -Lionel return open(path, O_DIRECTORY); } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 29/04/2021 03:34, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. v12: (Jason) - Split cpu timestamp array into timestamp and delta for cleaner API Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin Thanks for the update : Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_query.c | 148 ++ include/uapi/drm/i915_drm.h | 52 +++ 2 files changed, 200 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..357c44e8177c 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,151 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* +* Use logic same as the perf subsystem to allow user to select the +* reference clock id to be used for timestamps. +*/ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + u64 *cpu_delta, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + *cpu_delta = local_clock(); + *cpu_ts = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + *cpu_delta = local_clock() - *cpu_delta; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, u64 *cpu_delta, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), +
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 23:45, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin wrote: On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), +
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 23:14, Lionel Landwerlin wrote: On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock)
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), +
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock); + + intel_uncore_forcewake_put__locked(uncore, fw_domains); + spin_unlock_irq(&uncore->lock); + } + +
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 23/04/2021 18:11, Umesh Nerlige Ramappa wrote: On Fri, Apr 23, 2021 at 10:05:34AM +0300, Lionel Landwerlin wrote: On 21/04/2021 20:28, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..25b96927ab92 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock); + + intel_uncore_forcewake_put__locked(uncore, fw_domains); + spin_unlock_irq(&uncore->lock); + } + + return ret; +} + +static int +query_cs_cycles(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item) +{ + struct drm_i915_query_cs_cycles __user *query_ptr; + struct drm_i915_query_cs_cycles query; + struct intel_engine_cs *engine; +
Re: [Intel-gfx] [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 21/04/2021 20:28, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..25b96927ab92 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* +* Use logic same as the perf subsystem to allow user to select the +* reference clock id to be used for timestamps. +*/ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock); + + intel_uncore_forcewake_put__locked(uncore, fw_domains); + spin_unlock_irq(&uncore->lock); + } + + return ret; +} + +static int +query_cs_cycles(struct drm_i915_private *i915, +
Re: [Intel-gfx] [PATCH v3 00/16] Introduce Intel PXP
On 29/03/2021 01:56, Daniele Ceraolo Spurio wrote: PXP (Protected Xe Path) is an i915 component, available on GEN12+, that helps to establish the hardware protected session and manage the status of the alive software session, as well as its life cycle. Lots of minor changes and fixes, but the main changes in v3 are: - Using a protected object with a context not appropriately marked does no longer result in an execbuf failure. This is to avoid apps maliciously sharing protected/invalid objects to other apps and causing them to fail. - All the termination work now goes through the same worker function, which allows i915 to drop the mutex lock entirely. Cc: Gaurav Kumar Cc: Chris Wilson Cc: Rodrigo Vivi Cc: Joonas Lahtinen Cc: Juston Li Cc: Alan Previn Cc: Lionel Landwerlin I updated the Mesa MR to use this new version : - Iris: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8092 - Anv: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8064 No issue with this current iteration : Tested-by: Lionel Landwerlin Anshuman Gupta (2): drm/i915/pxp: Add plane decryption support drm/i915/pxp: black pixels on pxp disabled Bommu Krishnaiah (2): drm/i915/uapi: introduce drm_i915_gem_create_ext drm/i915/pxp: User interface for Protected buffer Daniele Ceraolo Spurio (6): drm/i915/pxp: Define PXP component interface drm/i915/pxp: define PXP device flag and kconfig drm/i915/pxp: allocate a vcs context for pxp usage drm/i915/pxp: set KCR reg init drm/i915/pxp: interface for marking contexts as using protected content drm/i915/pxp: enable PXP for integrated Gen12 Huang, Sean Z (5): drm/i915/pxp: Implement funcs to create the TEE channel drm/i915/pxp: Create the arbitrary session after boot drm/i915/pxp: Implement arb session teardown drm/i915/pxp: Implement PXP irq handler drm/i915/pxp: Enable PXP power management Vitaly Lubart (1): mei: pxp: export pavp client to me client bus drivers/gpu/drm/i915/Kconfig | 11 + drivers/gpu/drm/i915/Makefile | 9 + .../drm/i915/display/skl_universal_plane.c| 50 +++- drivers/gpu/drm/i915/gem/i915_gem_context.c | 59 +++- drivers/gpu/drm/i915/gem/i915_gem_context.h | 18 ++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_create.c| 68 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 34 +++ drivers/gpu/drm/i915/gem/i915_gem_object.c| 6 + drivers/gpu/drm/i915/gem/i915_gem_object.h| 12 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 13 + drivers/gpu/drm/i915/gt/intel_engine.h| 12 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 32 ++- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 22 +- drivers/gpu/drm/i915/gt/intel_gt.c| 5 + drivers/gpu/drm/i915/gt/intel_gt_irq.c| 7 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 14 +- drivers/gpu/drm/i915/gt/intel_gt_types.h | 3 + drivers/gpu/drm/i915/i915_drv.c | 4 +- drivers/gpu/drm/i915/i915_drv.h | 4 + drivers/gpu/drm/i915/i915_pci.c | 2 + drivers/gpu/drm/i915/i915_reg.h | 48 drivers/gpu/drm/i915/intel_device_info.h | 1 + drivers/gpu/drm/i915/pxp/intel_pxp.c | 262 ++ drivers/gpu/drm/i915/pxp/intel_pxp.h | 65 + drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c | 140 ++ drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h | 15 + drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 100 +++ drivers/gpu/drm/i915/pxp/intel_pxp_irq.h | 32 +++ drivers/gpu/drm/i915/pxp/intel_pxp_pm.c | 37 +++ drivers/gpu/drm/i915/pxp/intel_pxp_pm.h | 23 ++ drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 172 drivers/gpu/drm/i915/pxp/intel_pxp_session.h | 15 + drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 182 drivers/gpu/drm/i915/pxp/intel_pxp_tee.h | 17 ++ drivers/gpu/drm/i915/pxp/intel_pxp_types.h| 43 +++ drivers/misc/mei/Kconfig | 2 + drivers/misc/mei/Makefile | 1 + drivers/misc/mei/pxp/Kconfig | 13 + drivers/misc/mei/pxp/Makefile | 7 + drivers/misc/mei/pxp/mei_pxp.c| 233 drivers/misc/mei/pxp/mei_pxp.h| 18 ++ include/drm/i915_component.h | 1 + include/drm/i915_pxp_tee_interface.h | 45 +++ include/uapi/drm/i915_drm.h | 96 +++ 45 files changed, 1931 insertions(+), 24 deletions(-) create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp.c create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp.h create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h create mode 100644 drivers/gpu/drm/i915/pxp
Re: [Intel-gfx] [PATCH v3 11/16] drm/i915/pxp: interface for marking contexts as using protected content
On 29/03/2021 01:57, Daniele Ceraolo Spurio wrote: Extra tracking and checks around protected objects, coming in a follow-up patch, will be enabled only for contexts that opt in. Contexts can only be marked as using protected content at creation time and they must be both bannable and not recoverable. When a PXP teardown occurs, all gem contexts marked this way that have been used at least once will be marked as invalid and all new submissions using them will be rejected. All intel contexts within the invalidated gem contexts will be marked banned. A new flag has been added to the RESET_STATS ioctl to report the invalidation to userspace. v2: split to its own patch and improve doc (Chris), invalidate contexts on teardown v3: improve doc, use -EACCES for execbuf fail (Chris), make protected context flag not mandatory in protected object execbuf to avoid abuse (Lionel) Signed-off-by: Daniele Ceraolo Spurio Cc: Chris Wilson Cc: Lionel Landwerlin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 59 ++- drivers/gpu/drm/i915/gem/i915_gem_context.h | 18 ++ .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 18 ++ drivers/gpu/drm/i915/pxp/intel_pxp.c | 48 +++ drivers/gpu/drm/i915/pxp/intel_pxp.h | 1 + drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 3 + include/uapi/drm/i915_drm.h | 26 8 files changed, 172 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index fd8ee52e17a4..f3fd302682bb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -76,6 +76,8 @@ #include "gt/intel_gpu_commands.h" #include "gt/intel_ring.h" +#include "pxp/intel_pxp.h" + #include "i915_gem_context.h" #include "i915_globals.h" #include "i915_trace.h" @@ -1972,6 +1974,40 @@ static int set_priority(struct i915_gem_context *ctx, return 0; } +static int set_protected(struct i915_gem_context *ctx, +const struct drm_i915_gem_context_param *args) +{ + int ret = 0; + + if (!intel_pxp_is_enabled(&ctx->i915->gt.pxp)) + ret = -ENODEV; + else if (ctx->file_priv) /* can't change this after creation! */ + ret = -EEXIST; + else if (args->size) + ret = -EINVAL; + else if (!args->value) + clear_bit(UCONTEXT_PROTECTED, &ctx->user_flags); + else if (i915_gem_context_is_recoverable(ctx) || +!i915_gem_context_is_bannable(ctx)) + ret = -EPERM; + else + set_bit(UCONTEXT_PROTECTED, &ctx->user_flags); + + return ret; +} + +static int get_protected(struct i915_gem_context *ctx, +struct drm_i915_gem_context_param *args) +{ + if (!intel_pxp_is_enabled(&ctx->i915->gt.pxp)) + return -ENODEV; + + args->size = 0; + args->value = i915_gem_context_uses_protected_content(ctx); + + return 0; +} + static int ctx_setparam(struct drm_i915_file_private *fpriv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) @@ -2004,6 +2040,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = -EPERM; else if (args->value) i915_gem_context_set_bannable(ctx); + else if (i915_gem_context_uses_protected_content(ctx)) + ret = -EPERM; /* can't clear this for protected contexts */ else i915_gem_context_clear_bannable(ctx); break; @@ -2011,10 +2049,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, case I915_CONTEXT_PARAM_RECOVERABLE: if (args->size) ret = -EINVAL; - else if (args->value) - i915_gem_context_set_recoverable(ctx); - else + else if (!args->value) i915_gem_context_clear_recoverable(ctx); + else if (i915_gem_context_uses_protected_content(ctx)) + ret = -EPERM; /* can't set this for protected contexts */ + else + i915_gem_context_set_recoverable(ctx); break; case I915_CONTEXT_PARAM_PRIORITY: @@ -2041,6 +2081,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_ringsize(ctx, args); break; + case I915_CONTEXT_PARAM_PROTECTED_CONTENT: + ret = set_protected(ctx, args); + break; + case I915_CONTEXT_PARAM_BAN_PE
Re: [Intel-gfx] [PATCH v3 13/16] drm/i915/pxp: User interface for Protected buffer
On 29/03/2021 01:57, Daniele Ceraolo Spurio wrote: From: Bommu Krishnaiah This api allow user mode to create Protected buffers. Only contexts marked as protected are allowed to operate on protected buffers. We only allow setting the flags at creation time. All protected objects that have backing storage will be considered invalid when the session is destroyed and they won't be usable anymore. This is a rework of the original code by Bommu Krishnaiah. I've authorship unchanged since significant chunks have not been modified. v2: split context changes, fix defines and improve documentation (Chris), add object invalidation logic v3: fix spinlock definition and usage, only validate objects when they're first added to a context lut, only remove them once (Chris), make protected context flag not mandatory in protected object execbuf to avoid abuse (Lionel) Signed-off-by: Bommu Krishnaiah Signed-off-by: Daniele Ceraolo Spurio Cc: Telukuntla Sreedhar Cc: Kondapally Kalyan Cc: Gupta Anshuman Cc: Huang Sean Z Cc: Chris Wilson Cc: Lionel Landwerlin --- drivers/gpu/drm/i915/gem/i915_gem_create.c| 27 ++-- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 16 drivers/gpu/drm/i915/gem/i915_gem_object.c| 6 +++ drivers/gpu/drm/i915/gem/i915_gem_object.h| 12 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 13 ++ drivers/gpu/drm/i915/pxp/intel_pxp.c | 41 +++ drivers/gpu/drm/i915/pxp/intel_pxp.h | 13 ++ drivers/gpu/drm/i915/pxp/intel_pxp_types.h| 5 +++ include/uapi/drm/i915_drm.h | 20 + 9 files changed, 150 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 3ad3413c459f..d02e5938afbe 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -5,6 +5,7 @@ #include "gem/i915_gem_ioctls.h" #include "gem/i915_gem_region.h" +#include "pxp/intel_pxp.h" #include "i915_drv.h" #include "i915_user_extensions.h" @@ -13,7 +14,8 @@ static int i915_gem_create(struct drm_file *file, struct intel_memory_region *mr, u64 *size_p, - u32 *handle_p) + u32 *handle_p, + u64 user_flags) { struct drm_i915_gem_object *obj; u32 handle; @@ -35,12 +37,17 @@ i915_gem_create(struct drm_file *file, GEM_BUG_ON(size != obj->base.size); + obj->user_flags = user_flags; + ret = drm_gem_handle_create(file, &obj->base, &handle); /* drop reference from allocate - handle holds it now */ i915_gem_object_put(obj); if (ret) return ret; + if (user_flags & I915_GEM_OBJECT_PROTECTED) + intel_pxp_object_add(obj); + *handle_p = handle; *size_p = size; return 0; @@ -89,11 +96,12 @@ i915_gem_dumb_create(struct drm_file *file, return i915_gem_create(file, intel_memory_region_by_type(to_i915(dev), mem_type), - &args->size, &args->handle); + &args->size, &args->handle, 0); } struct create_ext { struct drm_i915_private *i915; + unsigned long user_flags; }; static int __create_setparam(struct drm_i915_gem_object_param *args, @@ -104,6 +112,19 @@ static int __create_setparam(struct drm_i915_gem_object_param *args, return -EINVAL; } + switch (lower_32_bits(args->param)) { + case I915_OBJECT_PARAM_PROTECTED_CONTENT: + if (!intel_pxp_is_enabled(&ext_data->i915->gt.pxp)) + return -ENODEV; + if (args->size) { + return -EINVAL; + } else if (args->data) { + ext_data->user_flags |= I915_GEM_OBJECT_PROTECTED; + return 0; + } + break; + } + return -EINVAL; } @@ -148,5 +169,5 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, return i915_gem_create(file, intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM), - &args->size, &args->handle); + &args->size, &args->handle, ext_data.user_flags); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 72c2470fcfe6..2fb6579ad301 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -20,6 +20,7 @@ #include &qu
Re: [Intel-gfx] [PATCH v2 13/16] drm/i915/pxp: User interface for Protected buffer
On 08/03/2021 22:40, Rodrigo Vivi wrote: On Wed, Mar 03, 2021 at 05:24:34PM -0800, Daniele Ceraolo Spurio wrote: On 3/3/2021 4:10 PM, Daniele Ceraolo Spurio wrote: On 3/3/2021 3:42 PM, Lionel Landwerlin wrote: On 04/03/2021 01:25, Daniele Ceraolo Spurio wrote: On 3/3/2021 3:16 PM, Lionel Landwerlin wrote: On 03/03/2021 23:59, Daniele Ceraolo Spurio wrote: On 3/3/2021 12:39 PM, Lionel Landwerlin wrote: On 01/03/2021 21:31, Daniele Ceraolo Spurio wrote: From: Bommu Krishnaiah This api allow user mode to create Protected buffers. Only contexts marked as protected are allowed to operate on protected buffers. We only allow setting the flags at creation time. All protected objects that have backing storage will be considered invalid when the session is destroyed and they won't be usable anymore. This is a rework of the original code by Bommu Krishnaiah. I've authorship unchanged since significant chunks have not been modified. v2: split context changes, fix defines and improve documentation (Chris), add object invalidation logic Signed-off-by: Bommu Krishnaiah Signed-off-by: Daniele Ceraolo Spurio Cc: Telukuntla Sreedhar Cc: Kondapally Kalyan Cc: Gupta Anshuman Cc: Huang Sean Z Cc: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 27 +++-- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 + drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 +++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 12 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 13 ++ drivers/gpu/drm/i915/pxp/intel_pxp.c | 40 +++ drivers/gpu/drm/i915/pxp/intel_pxp.h | 13 ++ drivers/gpu/drm/i915/pxp/intel_pxp_types.h | 5 +++ include/uapi/drm/i915_drm.h | 22 ++ 9 files changed, 145 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index 3ad3413c459f..d02e5938afbe 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -5,6 +5,7 @@ #include "gem/i915_gem_ioctls.h" #include "gem/i915_gem_region.h" +#include "pxp/intel_pxp.h" #include "i915_drv.h" #include "i915_user_extensions.h" @@ -13,7 +14,8 @@ static int i915_gem_create(struct drm_file *file, struct intel_memory_region *mr, u64 *size_p, - u32 *handle_p) + u32 *handle_p, + u64 user_flags) { struct drm_i915_gem_object *obj; u32 handle; @@ -35,12 +37,17 @@ i915_gem_create(struct drm_file *file, GEM_BUG_ON(size != obj->base.size); + obj->user_flags = user_flags; + ret = drm_gem_handle_create(file, &obj->base, &handle); /* drop reference from allocate - handle holds it now */ i915_gem_object_put(obj); if (ret) return ret; + if (user_flags & I915_GEM_OBJECT_PROTECTED) + intel_pxp_object_add(obj); + *handle_p = handle; *size_p = size; return 0; @@ -89,11 +96,12 @@ i915_gem_dumb_create(struct drm_file *file, return i915_gem_create(file, intel_memory_region_by_type(to_i915(dev), mem_type), - &args->size, &args->handle); + &args->size, &args->handle, 0); } struct create_ext { struct drm_i915_private *i915; + unsigned long user_flags; }; static int __create_setparam(struct drm_i915_gem_object_param *args, @@ -104,6 +112,19 @@ static int __create_setparam(struct drm_i915_gem_object_param *args, return -EINVAL; } + switch (lower_32_bits(args->param)) { + case I915_OBJECT_PARAM_PROTECTED_CONTENT: + if (!intel_pxp_is_enabled(&ext_data->i915->gt.pxp)) + return -ENODEV; + if (args->size) { + return -EINVAL; + } else if (args->data) { + ext_data->user_flags |= I915_GEM_OBJECT_PROTECTED; + return 0; + } + break; + } + return -EINVAL; } @@ -148,5 +169,5 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, return i915_gem_create(file, intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM), - &args->size, &args->handle); + &args->size, &args->handle, ext_data.user_flags); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e503c9f789c0..d10c4fcb6aec 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -20,6 +20,7 @@ #include "gt/intel_gt_buffer_pool.h" #include "gt/intel_gt_pm.h" #include "gt/intel_ring.h" +#inc
Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
On 03/03/2021 23:28, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 144 ++ FYI, the MR for Mesa : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9407 -Lionel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx