[ANNOUNCE] mesa 21.3.8
Hello everyone, The eighth and final bugfix release, 21.3.8, is now available. Please upgrade to the 22.0 series if your hardware supports it, or wait for the announcement for the upcoming Amber branch for legacy hardware. Cheers, Eric --- Adam Jackson (1): meson: Add "amber" option for automatic LTS build configuration Alyssa Rosenzweig (6): panfrost: Fix FD resource_get_handle panfrost: Handle NULL sampler views panfrost: Handle NULL samplers panfrost: Flush resources when shadowing panfrost: Push twice as many uniforms panfrost: Fix set_sampler_views for big GL Connor Abbott (4): ir3: Don't always set bindless_tex with readonly images ir3/nir: Fix 1d array readonly images ir3/ra: Sanitize parallel copy flags better util/bitset: Fix off-by-one in __bitset_set_range Danylo Piliaiev (1): turnip: Use LATE_Z when there might be depth/stencil feedback loop Dave Airlie (5): draw/so: don't use pre clip pos if we have a tes either. crocus: change the line width workaround for gfx4/5 gallivm/nir: extract a valid texture index according to exec_mask. zink: workaround depth texture mode alpha. lavapipe: remove broken workaround for zink depth texturing. Eric Engestrom (16): .pick_status.json: Update to 2106c3bab6bdea736c468fb1866fd0f372cc0baa .pick_status.json: Mark 7ec0e2b89351e6e56cb112e00e6c68c6bbc6faea as denominated .pick_status.json: Mark 0136545d169adb75e4f9f6b4de38eef0817c1241 as denominated .pick_status.json: Mark 62b8daa889daefb2f191a63f370541bf2b807e88 as denominated .pick_status.json: Mark 698ae34844b7199b8acc3b4d74a9cad3b903bdef as denominated .pick_status.json: Mark 03a80490a47b0b616566c6f56581560694976b1a as denominated .pick_status.json: Mark e1964e1dde7bf44ceeaf3fa8b3869e791af4a369 as denominated .pick_status.json: Mark 3ef093f697ad9027ba514c7a4a6a10b7bd95bd47 as denominated .pick_status.json: Mark 2d1b506acfe55165511a2bb83acb013353e531ab as denominated .pick_status.json: Mark 204ea77b0674fb611155bd3ba2e6169cc8646b3f as denominated .pick_status.json: Mark a5c7d34fdf8403b0115d5eead7ca67027e93efc7 as denominated .pick_status.json: Mark 432700fc61a33e0c040d47d9b7bd8cfe970d35cc as denominated .pick_status.json: Mark 4ed7329236a576b6b6f615787bb722b960f32c6b as denominated .pick_status.json: Mark 3f7da0c58447979976eb2928625b1f93154f6c57 as denominated docs: add release notes for 21.3.8 VERSION: bump for 21.3.8 Erik Faye-Lund (2): docs: remove incorrect drivers from extension docs: fixup zink gl 4.3 requirements Icecream95 (6): panfrost: Set PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION pan/bi: Check dependencies of both destinations of instructions panfrost: Set dirty state in set_shader_buffers panfrost: Re-emit descriptors after resource shadowing pan/bi: Make disassembler build reproducibly panfrost: Fix ubo_mask calculation Jason Ekstrand (2): anv: Don't assume depth/stencil attachments have depth lavapipe: Reset the free_cmd_buffers list in TrimCommandPool Jonathan Gray (6): util: unbreak non-linux mips64 build util: fix util_cpu_detect_once() build on OpenBSD util/u_atomic: fix build on clang archs without 64-bit atomics util: fix build with clang 10 on mips64 util: use correct type in sysctl argument radv: use MAJOR_IN_SYSMACROS for sysmacros.h include Lionel Landwerlin (3): anv: fix conditional render for vkCmdDrawIndirectByteCountEXT anv: fix fast clear type value with external images intel/fs: fix total_scratch computation Marek Olšák (2): amd: add a workaround for an SQ perf counter bug radeonsi: fix an assertion failure with register shadowing Mike Blumenkrantz (16): gallivm: avoid division by zero when computing cube face zink: always update shader variants when rebinding a gfx program zink: use a fence for pipeline cache update jobs zink: wait on program cache fences before destroying programs zink: fix descriptor cache pointer array allocation zink: mark fbfetch push sets as non-cached zink: stop leaking descriptor sets zink: invalidate non-punted recycled descriptor sets that are not valid zink: fix 64bit float shader ops llvmpipe: fix debug print iterating in set_framebuffer_state llvmpipe: clamp surface clear geometry lavapipe: update multisample state after blend state aux/trace: rzalloc the context struct zink: lower dmod on AMD hardware lavapipe: skip format checks for EXTENDED_USAGE lavapipe: run nir_opt_copy_prop_vars during optimization loop Paulo Zanoni (1): iris: fix register spilling on compute shaders on XeHP Pierre-Eric Pelloux-Prayer (3): radeonsi: change rounding mode to round to even util/slab: add slab_za
Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi
On 18/03/2022 09:38, Lionel Landwerlin wrote: Hey Matthew, all, This sounds like a good thing to have. There are a number of DG2 machines where we have a small BAR and this is causing more apps to fail. Anv currently reports 3 memory heaps to the app : - local device only (not host visible) -> mapped to lmem - device/cpu -> mapped to smem - local device but also host visible -> mapped to lmem So we could use this straight away, by just not putting the I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the first heap. One thing I don't see in this proposal is how can we get the size of the 2 lmem heap : cpu visible, cpu not visible We could use that to report the appropriate size to the app. We probably want to report a new drm_i915_memory_region_info and either : - put one of the reserve field to use to indicate : cpu visible - or define a new enum value in drm_i915_gem_memory_class Thanks for taking a look at this. Returning the probed CPU visible size as part of the region query seems reasonable. Something like: @@ -3074,8 +3074,18 @@ struct drm_i915_memory_region_info { /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ __u64 unallocated_size; - /** @rsvd1: MBZ */ - __u64 rsvd1[8]; + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown) +*/ + __u64 probed_cpu_visible_size; + }; + }; I will add this in the next version, if no objections. Cheers, -Lionel On 18/02/2022 13:22, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-dev@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 153 +++ Documentation/gpu/rfc/i915_small_bar.rst | 40 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 197 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..fa65835fd608 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,153 @@ +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that in the future we want to have our buffer flags here, at least for + * the stuff that is immutable. Previously we would have two ioctls, one to + * create the object with gem_create, and another to apply various parameters, + * however this creates some ambiguity for the params which are considered + * immutable. Also in general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE, + * will need to enable this hint, if the object can also be placed in + * I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will + * throw an error otherwise. This also means that such objects will need + * I915_MEMORY_CLASS_SYSTEM set as a possible placement. + * + * Without this hi
Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi
Hey Matthew, all, This sounds like a good thing to have. There are a number of DG2 machines where we have a small BAR and this is causing more apps to fail. Anv currently reports 3 memory heaps to the app : - local device only (not host visible) -> mapped to lmem - device/cpu -> mapped to smem - local device but also host visible -> mapped to lmem So we could use this straight away, by just not putting the I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the first heap. One thing I don't see in this proposal is how can we get the size of the 2 lmem heap : cpu visible, cpu not visible We could use that to report the appropriate size to the app. We probably want to report a new drm_i915_memory_region_info and either : - put one of the reserve field to use to indicate : cpu visible - or define a new enum value in drm_i915_gem_memory_class Cheers, -Lionel On 18/02/2022 13:22, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-dev@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 153 +++ Documentation/gpu/rfc/i915_small_bar.rst | 40 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 197 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..fa65835fd608 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,153 @@ +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that in the future we want to have our buffer flags here, at least for + * the stuff that is immutable. Previously we would have two ioctls, one to + * create the object with gem_create, and another to apply various parameters, + * however this creates some ambiguity for the params which are considered + * immutable. Also in general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE, +* will need to enable this hint, if the object can also be placed in +* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will +* throw an error otherwise. This also means that such objects will need +* I915_MEMORY_CLASS_SYSTEM set as a possible placement. +* +* Without this hint, the kernel will assume that non-mappable +* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the +* kernel can still migrate the object to the mappable part, as a last +* resort, if userspace ever CPU faults this object, but this might be +* expensive, and so ideally should be avoided. +*/ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) + __u32 flags; + /** +* @extensions: The chain of extensions to apply to this object. +* +* This will be useful in the future when we need to support several +* different extensions, and we need to apply more than one when +* creating the object. See struct i915_user_extension. +* +* If we d