[ANNOUNCE] mesa 21.3.8

2022-03-18 Thread Eric Engestrom
Hello everyone,

The eighth and final bugfix release, 21.3.8, is now available.

Please upgrade to the 22.0 series if your hardware supports it, or wait
for the announcement for the upcoming Amber branch for legacy hardware.

Cheers,
  Eric

---

Adam Jackson (1):
  meson: Add "amber" option for automatic LTS build configuration

Alyssa Rosenzweig (6):
  panfrost: Fix FD resource_get_handle
  panfrost: Handle NULL sampler views
  panfrost: Handle NULL samplers
  panfrost: Flush resources when shadowing
  panfrost: Push twice as many uniforms
  panfrost: Fix set_sampler_views for big GL

Connor Abbott (4):
  ir3: Don't always set bindless_tex with readonly images
  ir3/nir: Fix 1d array readonly images
  ir3/ra: Sanitize parallel copy flags better
  util/bitset: Fix off-by-one in __bitset_set_range

Danylo Piliaiev (1):
  turnip: Use LATE_Z when there might be depth/stencil feedback loop

Dave Airlie (5):
  draw/so: don't use pre clip pos if we have a tes either.
  crocus: change the line width workaround for gfx4/5
  gallivm/nir: extract a valid texture index according to exec_mask.
  zink: workaround depth texture mode alpha.
  lavapipe: remove broken workaround for zink depth texturing.

Eric Engestrom (16):
  .pick_status.json: Update to 2106c3bab6bdea736c468fb1866fd0f372cc0baa
  .pick_status.json: Mark 7ec0e2b89351e6e56cb112e00e6c68c6bbc6faea as 
denominated
  .pick_status.json: Mark 0136545d169adb75e4f9f6b4de38eef0817c1241 as 
denominated
  .pick_status.json: Mark 62b8daa889daefb2f191a63f370541bf2b807e88 as 
denominated
  .pick_status.json: Mark 698ae34844b7199b8acc3b4d74a9cad3b903bdef as 
denominated
  .pick_status.json: Mark 03a80490a47b0b616566c6f56581560694976b1a as 
denominated
  .pick_status.json: Mark e1964e1dde7bf44ceeaf3fa8b3869e791af4a369 as 
denominated
  .pick_status.json: Mark 3ef093f697ad9027ba514c7a4a6a10b7bd95bd47 as 
denominated
  .pick_status.json: Mark 2d1b506acfe55165511a2bb83acb013353e531ab as 
denominated
  .pick_status.json: Mark 204ea77b0674fb611155bd3ba2e6169cc8646b3f as 
denominated
  .pick_status.json: Mark a5c7d34fdf8403b0115d5eead7ca67027e93efc7 as 
denominated
  .pick_status.json: Mark 432700fc61a33e0c040d47d9b7bd8cfe970d35cc as 
denominated
  .pick_status.json: Mark 4ed7329236a576b6b6f615787bb722b960f32c6b as 
denominated
  .pick_status.json: Mark 3f7da0c58447979976eb2928625b1f93154f6c57 as 
denominated
  docs: add release notes for 21.3.8
  VERSION: bump for 21.3.8

Erik Faye-Lund (2):
  docs: remove incorrect drivers from extension
  docs: fixup zink gl 4.3 requirements

Icecream95 (6):
  panfrost: Set PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION
  pan/bi: Check dependencies of both destinations of instructions
  panfrost: Set dirty state in set_shader_buffers
  panfrost: Re-emit descriptors after resource shadowing
  pan/bi: Make disassembler build reproducibly
  panfrost: Fix ubo_mask calculation

Jason Ekstrand (2):
  anv: Don't assume depth/stencil attachments have depth
  lavapipe: Reset the free_cmd_buffers list in TrimCommandPool

Jonathan Gray (6):
  util: unbreak non-linux mips64 build
  util: fix util_cpu_detect_once() build on OpenBSD
  util/u_atomic: fix build on clang archs without 64-bit atomics
  util: fix build with clang 10 on mips64
  util: use correct type in sysctl argument
  radv: use MAJOR_IN_SYSMACROS for sysmacros.h include

Lionel Landwerlin (3):
  anv: fix conditional render for vkCmdDrawIndirectByteCountEXT
  anv: fix fast clear type value with external images
  intel/fs: fix total_scratch computation

Marek Olšák (2):
  amd: add a workaround for an SQ perf counter bug
  radeonsi: fix an assertion failure with register shadowing

Mike Blumenkrantz (16):
  gallivm: avoid division by zero when computing cube face
  zink: always update shader variants when rebinding a gfx program
  zink: use a fence for pipeline cache update jobs
  zink: wait on program cache fences before destroying programs
  zink: fix descriptor cache pointer array allocation
  zink: mark fbfetch push sets as non-cached
  zink: stop leaking descriptor sets
  zink: invalidate non-punted recycled descriptor sets that are not valid
  zink: fix 64bit float shader ops
  llvmpipe: fix debug print iterating in set_framebuffer_state
  llvmpipe: clamp surface clear geometry
  lavapipe: update multisample state after blend state
  aux/trace: rzalloc the context struct
  zink: lower dmod on AMD hardware
  lavapipe: skip format checks for EXTENDED_USAGE
  lavapipe: run nir_opt_copy_prop_vars during optimization loop

Paulo Zanoni (1):
  iris: fix register spilling on compute shaders on XeHP

Pierre-Eric Pelloux-Prayer (3):
  radeonsi: change rounding mode to round to even
  util/slab: add slab_za

Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi

2022-03-18 Thread Matthew Auld

On 18/03/2022 09:38, Lionel Landwerlin wrote:

Hey Matthew, all,

This sounds like a good thing to have.
There are a number of DG2 machines where we have a small BAR and this is 
causing more apps to fail.


Anv currently reports 3 memory heaps to the app :

     - local device only (not host visible) -> mapped to lmem
     - device/cpu -> mapped to smem
     - local device but also host visible -> mapped to lmem

So we could use this straight away, by just not putting the 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the 
first heap.


One thing I don't see in this proposal is how can we get the size of the 
2 lmem heap : cpu visible, cpu not visible

We could use that to report the appropriate size to the app.
We probably want to report a new drm_i915_memory_region_info and either :
     - put one of the reserve field to use to indicate : cpu visible
     - or define a new enum value in drm_i915_gem_memory_class


Thanks for taking a look at this. Returning the probed CPU visible size 
as part of the region query seems reasonable. Something like:


@@ -3074,8 +3074,18 @@ struct drm_i915_memory_region_info {
/** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

__u64 unallocated_size;

-   /** @rsvd1: MBZ */
-   __u64 rsvd1[8];
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by 
the driver

+* that is CPU accessible. (-1 = unknown)
+*/
+   __u64 probed_cpu_visible_size;
+   };
+   };


I will add this in the next version, if no objections.



Cheers,

-Lionel


On 18/02/2022 13:22, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 153 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  40 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 197 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..fa65835fd608
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,153 @@
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at 
least for
+ * the stuff that is immutable. Previously we would have two ioctls, 
one to
+ * create the object with gem_create, and another to apply various 
parameters,
+ * however this creates some ambiguity for the params which are 
considered
+ * immutable. Also in general we're phasing out the various SET/GET 
ioctls.

+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, if we
+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that buffers that need to be captured with 
EXEC_OBJECT_CAPTURE,
+ * will need to enable this hint, if the object can also be 
placed in
+ * I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call 
will
+ * throw an error otherwise. This also means that such objects 
will need

+ * I915_MEMORY_CLASS_SYSTEM set as a possible placement.
+ *
+ * Without this hi

Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi

2022-03-18 Thread Lionel Landwerlin

Hey Matthew, all,

This sounds like a good thing to have.
There are a number of DG2 machines where we have a small BAR and this is 
causing more apps to fail.


Anv currently reports 3 memory heaps to the app :

    - local device only (not host visible) -> mapped to lmem
    - device/cpu -> mapped to smem
    - local device but also host visible -> mapped to lmem

So we could use this straight away, by just not putting the 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the 
first heap.


One thing I don't see in this proposal is how can we get the size of the 
2 lmem heap : cpu visible, cpu not visible

We could use that to report the appropriate size to the app.
We probably want to report a new drm_i915_memory_region_info and either :
    - put one of the reserve field to use to indicate : cpu visible
    - or define a new enum value in drm_i915_gem_memory_class

Cheers,

-Lionel


On 18/02/2022 13:22, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 153 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  40 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 197 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..fa65835fd608
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,153 @@
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE,
+* will need to enable this hint, if the object can also be placed in
+* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will
+* throw an error otherwise. This also means that such objects will need
+* I915_MEMORY_CLASS_SYSTEM set as a possible placement.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+* kernel can still migrate the object to the mappable part, as a last
+* resort, if userspace ever CPU faults this object, but this might be
+* expensive, and so ideally should be avoided.
+*/
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+   __u32 flags;
+   /**
+* @extensions: The chain of extensions to apply to this object.
+*
+* This will be useful in the future when we need to support several
+* different extensions, and we need to apply more than one when
+* creating the object. See struct i915_user_extension.
+*
+* If we d