Re: [PATCH v2 1/2] drm/ttm: Change ttm_device_init to use a struct instead of multiple bools

2024-10-02 Thread Zack Rusin
On Wed, Oct 2, 2024 at 8:24 AM Thomas Hellström
 wrote:
>
> The ttm_device_init funcition uses multiple bool arguments. That means
> readability in the caller becomes poor, and all callers need to change if
> yet another bool is added.
>
> Instead use a struct with multiple single-bit flags. This addresses both
> problems. Prefer it over using defines or enums with explicit bit shifts,
> since converting to and from these bit values uses logical operations or
> tests which are implicit with the struct usage, and ofc type-checking.
>
> This is in preparation of adding yet another bool flag parameter to the
> function.
>
> Cc: Christian König 
> Cc: amd-...@lists.freedesktop.org
> Cc: intel-...@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Cc: spice-de...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: Zack Rusin 
> Cc: 
> Cc: Sui Jingfeng 
> Cc: 
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  6 --
>  drivers/gpu/drm/drm_gem_vram_helper.c |  7 ---
>  drivers/gpu/drm/i915/intel_region_ttm.c   |  3 ++-
>  drivers/gpu/drm/loongson/lsdc_ttm.c   |  5 -
>  drivers/gpu/drm/nouveau/nouveau_ttm.c |  7 +--
>  drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
>  drivers/gpu/drm/radeon/radeon_ttm.c   |  6 --
>  drivers/gpu/drm/ttm/tests/ttm_bo_test.c   | 16 +++
>  .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |  3 ++-
>  drivers/gpu/drm/ttm/tests/ttm_device_test.c   | 16 ---
>  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 20 ---
>  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |  6 ++
>  drivers/gpu/drm/ttm/ttm_device.c  |  7 +++
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  4 ++--
>  drivers/gpu/drm/xe/xe_device.c|  3 ++-
>  include/drm/ttm/ttm_device.h  | 12 ++-
>  16 files changed, 71 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 74adb983ab03..e43635ac54fd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1853,8 +1853,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
>adev_to_drm(adev)->anon_inode->i_mapping,
>adev_to_drm(adev)->vma_offset_manager,
> -  adev->need_swiotlb,
> -  dma_addressing_limited(adev->dev));
> +  (struct ttm_device_init_flags){
> +  .use_dma_alloc = adev->need_swiotlb,
> +  .use_dma32 = 
> dma_addressing_limited(adev->dev)
> +  });
> if (r) {
> DRM_ERROR("failed initializing buffer object driver(%d).\n", 
> r);
> return r;
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
> b/drivers/gpu/drm/drm_gem_vram_helper.c
> index 22b1fe9c03b8..7c3165b00378 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -931,9 +931,10 @@ static int drm_vram_mm_init(struct drm_vram_mm *vmm, 
> struct drm_device *dev,
> vmm->vram_size = vram_size;
>
> ret = ttm_device_init(&vmm->bdev, &bo_driver, dev->dev,
> -dev->anon_inode->i_mapping,
> -dev->vma_offset_manager,
> -false, true);
> + dev->anon_inode->i_mapping,
> + dev->vma_offset_manager,
> + (struct ttm_device_init_flags)
> + {.use_dma32 = true});
> if (ret)
> return ret;
>
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c 
> b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 04525d92bec5..db34da63814c 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -34,7 +34,8 @@ int intel_region_ttm_device_init(struct drm_i915_private 
> *dev_priv)
>
> return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(),
>drm->dev, drm->anon_inode->i_mapping,
> -  drm->vma_offset_manager, false, false);
> +  drm->vma_offset_manager,
> +  (struct ttm_device_in

Re: [PATCH v4 71/80] drm/vmwgfx: Run DRM default client setup

2024-09-09 Thread Zack Rusin
On Mon, Sep 9, 2024 at 7:37 AM Thomas Zimmermann  wrote:
>
> Call drm_client_setup() to run the kernel's default client setup
> for DRM. Set fbdev_probe in struct drm_driver, so that the client
> setup can start the common fbdev client.
>
> Signed-off-by: Thomas Zimmermann 
> Cc: Zack Rusin 
> Cc: Broadcom internal kernel review list 
> 
> Acked-by: Javier Martinez Canillas 

Quick note: I love what you did with drm client and drm fbdev. Thanks
a lot for that work!

Reviewed-by: Zack Rusin 

z


Re: [PATCH] drm/vmwgfx: Add tracepoints

2024-09-06 Thread Zack Rusin
On Fri, Sep 6, 2024 at 11:26 AM Ian Forbes  wrote:
>
> On Thu, Sep 5, 2024 at 10:59 PM Zack Rusin  wrote:
> >
> >
> > In general it looks good but what's the reason for the submit_time?
> >
> > z
>
> So you can get an approximate time of how long each command buffer takes.
> You can then use it to construct a histogram or look for outliers
> using bpftrace.
> Useful when doing performance analysis to determine if slowdowns are being
> caused by the host or the guest driver.
>
> $ sudo bpftrace -e 'tracepoint:vmwgfx:vmwgfx_cmdbuf_done{
> if(args->status == 1) { $elapsed =(jiffies -
> args->header->submit_time); @exec_times = hist($elapsed);  } }'
> Attaching 1 probe...

Can't you do the same with just:
bpftrace -e 'kprobe:vmw_cmdbuf_header_submit { @start[tid] = nsecs; }
kretprobe:vmw_cmdbuf_header_submit /@start[tid]/ { @ns[comm] =
hist(nsecs - @start[tid]); delete(@start[tid]); }'
Or kfunc/kretfunc if you want to condition it based on args->status?

z


Re: [PATCH] drm/vmwgfx: Add tracepoints

2024-09-05 Thread Zack Rusin
On Thu, Sep 5, 2024 at 2:17 PM Ian Forbes  wrote:
>
> Adds the necessary files to create tracepoints for the vmwgfx driver.
>
> Adds a single tracepoint for command buffer completion. This tracepoint
> can be used to time command buffer execution time and to decode command
> buffer errors. The submission time is also now recorded when the command
> buffer is submitted to hardware.
>
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/Makefile|  2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c |  9 +++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_trace.c  | 32 +
>  drivers/gpu/drm/vmwgfx/vmwgfx_trace.h  | 62 ++
>  4 files changed, 102 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_trace.c
>  create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_trace.h
>
> diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
> index 46a4ab688a7f..482c1935bde6 100644
> --- a/drivers/gpu/drm/vmwgfx/Makefile
> +++ b/drivers/gpu/drm/vmwgfx/Makefile
> @@ -10,6 +10,6 @@ vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_kms.o 
> vmwgfx_drv.o \
> vmwgfx_simple_resource.o vmwgfx_va.o vmwgfx_blit.o \
> vmwgfx_validation.o vmwgfx_page_dirty.o vmwgfx_streamoutput.o \
> vmwgfx_devcaps.o ttm_object.o vmwgfx_system_manager.o \
> -   vmwgfx_gem.o vmwgfx_vkms.o
> +   vmwgfx_gem.o vmwgfx_vkms.o vmwgfx_trace.o
>
>  obj-$(CONFIG_DRM_VMWGFX) := vmwgfx.o
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c
> index 94e8982f5616..1ac7f382cdb1 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c
> @@ -27,6 +27,7 @@
>
>  #include "vmwgfx_bo.h"
>  #include "vmwgfx_drv.h"
> +#include "vmwgfx_trace.h"
>
>  #include 
>
> @@ -141,6 +142,7 @@ struct vmw_cmdbuf_man {
>   * @man: The command buffer manager.
>   * @cb_header: Device command buffer header, allocated from a DMA pool.
>   * @cb_context: The device command buffer context.
> + * @inline_space: Whether inline command buffer space is used.
>   * @list: List head for attaching to the manager lists.
>   * @node: The range manager node.
>   * @handle: The DMA address of @cb_header. Handed to the device on command
> @@ -148,19 +150,20 @@ struct vmw_cmdbuf_man {
>   * @cmd: Pointer to the command buffer space of this buffer.
>   * @size: Size of the command buffer space of this buffer.
>   * @reserved: Reserved space of this buffer.
> - * @inline_space: Whether inline command buffer space is used.
> + * @submit_time: When the CB was submitted to hardware in jiffies.

In general it looks good but what's the reason for the submit_time?

z


[PATCH v2] drm/vmwgfx: Cleanup kms setup without 3d

2024-08-26 Thread Zack Rusin
Do not validate format equality for the non 3d cases to allow xrgb to
argb copies and make sure the dx binding flags are only used
on dx compatible surfaces.

Fixes basic 2d kms setup on configurations without 3d. There's little
practical benefit to it because kms framebuffer coherence is disabled
on configurations without 3d but with those changes the code actually
makes sense.

v2: Remove the now unused format variable

Signed-off-by: Zack Rusin 
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Cc: Maaz Mombasawala 
Cc: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 29 -
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c |  9 +---
 2 files changed, 6 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 288ed0bb75cb..282b6153bcdd 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1283,7 +1283,6 @@ static int vmw_kms_new_framebuffer_surface(struct 
vmw_private *dev_priv,
 {
struct drm_device *dev = &dev_priv->drm;
struct vmw_framebuffer_surface *vfbs;
-   enum SVGA3dSurfaceFormat format;
struct vmw_surface *surface;
int ret;
 
@@ -1320,34 +1319,6 @@ static int vmw_kms_new_framebuffer_surface(struct 
vmw_private *dev_priv,
return -EINVAL;
}
 
-   switch (mode_cmd->pixel_format) {
-   case DRM_FORMAT_ARGB:
-   format = SVGA3D_A8R8G8B8;
-   break;
-   case DRM_FORMAT_XRGB:
-   format = SVGA3D_X8R8G8B8;
-   break;
-   case DRM_FORMAT_RGB565:
-   format = SVGA3D_R5G6B5;
-   break;
-   case DRM_FORMAT_XRGB1555:
-   format = SVGA3D_A1R5G5B5;
-   break;
-   default:
-   DRM_ERROR("Invalid pixel format: %p4cc\n",
- &mode_cmd->pixel_format);
-   return -EINVAL;
-   }
-
-   /*
-* For DX, surface format validation is done when surface->scanout
-* is set.
-*/
-   if (!has_sm4_context(dev_priv) && format != surface->metadata.format) {
-   DRM_ERROR("Invalid surface format for requested mode.\n");
-   return -EINVAL;
-   }
-
vfbs = kzalloc(sizeof(*vfbs), GFP_KERNEL);
if (!vfbs) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 1625b30d9970..5721c74da3e0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2276,9 +2276,12 @@ int vmw_dumb_create(struct drm_file *file_priv,
const struct SVGA3dSurfaceDesc *desc = vmw_surface_get_desc(format);
SVGA3dSurfaceAllFlags flags = SVGA3D_SURFACE_HINT_TEXTURE |
  SVGA3D_SURFACE_HINT_RENDERTARGET |
- SVGA3D_SURFACE_SCREENTARGET |
- SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
- SVGA3D_SURFACE_BIND_RENDER_TARGET;
+ SVGA3D_SURFACE_SCREENTARGET;
+
+   if (vmw_surface_is_dx_screen_target_format(format)) {
+   flags |= SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
+SVGA3D_SURFACE_BIND_RENDER_TARGET;
+   }
 
/*
 * Without mob support we're just going to use raw memory buffer
-- 
2.43.0



[PATCH] drm/vmwgfx: Cleanup kms setup without 3d

2024-08-26 Thread Zack Rusin
Do not validate format equality for the non 3d cases to allow xrgb to
argb copies and make sure the dx binding flags are only used
on dx compatible surfaces.

Fixes basic 2d kms setup on configurations without 3d. There's little
practical benefit to it because kms framebuffer coherence is disabled
on configurations without 3d but with those changes the code actually
makes sense.

Signed-off-by: Zack Rusin 
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Cc: Maaz Mombasawala 
Cc: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 9 -
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 9 ++---
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 288ed0bb75cb..b5fc5a9e123a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1339,15 +1339,6 @@ static int vmw_kms_new_framebuffer_surface(struct 
vmw_private *dev_priv,
return -EINVAL;
}
 
-   /*
-* For DX, surface format validation is done when surface->scanout
-* is set.
-*/
-   if (!has_sm4_context(dev_priv) && format != surface->metadata.format) {
-   DRM_ERROR("Invalid surface format for requested mode.\n");
-   return -EINVAL;
-   }
-
vfbs = kzalloc(sizeof(*vfbs), GFP_KERNEL);
if (!vfbs) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 1625b30d9970..5721c74da3e0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2276,9 +2276,12 @@ int vmw_dumb_create(struct drm_file *file_priv,
const struct SVGA3dSurfaceDesc *desc = vmw_surface_get_desc(format);
SVGA3dSurfaceAllFlags flags = SVGA3D_SURFACE_HINT_TEXTURE |
  SVGA3D_SURFACE_HINT_RENDERTARGET |
- SVGA3D_SURFACE_SCREENTARGET |
- SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
- SVGA3D_SURFACE_BIND_RENDER_TARGET;
+ SVGA3D_SURFACE_SCREENTARGET;
+
+   if (vmw_surface_is_dx_screen_target_format(format)) {
+   flags |= SVGA3D_SURFACE_BIND_SHADER_RESOURCE |
+SVGA3D_SURFACE_BIND_RENDER_TARGET;
+   }
 
/*
 * Without mob support we're just going to use raw memory buffer
-- 
2.43.0



Re: [REGRESSION][BISECTED] vmwgfx crashes with command buffer error after update

2024-08-16 Thread Zack Rusin
On Thu, Aug 15, 2024 at 4:30 PM Andreas Piesk  wrote:
>
> Hello,
>
> the bug was first reported on VMware Workstation by rdkehn.
>
> On my setup (archlinux text mode only VM on ESXi 8.0U3 latest) the kernel 
> does NOT crash, the screen just goes dark after switching the console from
>
> [2.745694] Console: switching to colour dummy device 80x25
>
> to
>
> [2.771998] Console: switching to colour frame buffer device 160x50
>
> You see the VMware remote console resizing, then going black and from this 
> point no more output.
>
> I have attached boot_journal and vmware.log from my setup. VM uses EFI boot 
> and SVGA with defaults as display adapter, I attached the vmx file too.

Thanks! I see. I have a patch out that fixes it, but in general I
think those vm's with 16mb for graphics are very risky and I'd suggest
bumping them to at least 32mb. The vram portion can stay at 16mb, but
the graphicsMemoryKB can be safely set to fourth or even half of
memsize (in your config 256mb or even 512mb), which will make the vm's
a lot safer and allow actual ui usage because with console being
pinned we just don't have a lot of wiggle room otherwise and we just
can't migrate pinned framebuffers.
The patch that "regressed" this makes dumb buffers surface that
actually respect pinning, but as long as you don't have gpu host side
things will be ok. Otherwise we can't make a config with 16mb of
available graphics memory and graphics acceleration work.

z


[PATCH 3/3] drm/vmwgfx: Disable coherent dumb buffers without 3d

2024-08-16 Thread Zack Rusin
Coherent surfaces make only sense if the host renders to them using
accelerated apis. Without 3d the entire content of dumb buffers stays
in the guest making all of the extra work they're doing to synchronize
between guest and host useless.

Configurations without 3d also tend to run with very low graphics
memory limits. The pinned console fb, mob cursors and graphical login
manager tend to run out of 16MB graphics memory that those guests use.

Fix it by making sure the coherent dumb buffers are only used on
configs with 3d enabled.

Signed-off-by: Zack Rusin 
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Reported-by: Christian Heusel 
Closes: 
https://lore.kernel.org/all/0d0330f3-2ac0-4cd5-8075-7f1cbaf72...@heusel.eu
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Cc: Maaz Mombasawala 
Cc: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 8ae6a761c900..1625b30d9970 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2283,9 +2283,11 @@ int vmw_dumb_create(struct drm_file *file_priv,
/*
 * Without mob support we're just going to use raw memory buffer
 * because we wouldn't be able to support full surface coherency
-* without mobs
+* without mobs. There also no reason to support surface coherency
+* without 3d (i.e. gpu usage on the host) because then all the
+* contents is going to be rendered guest side.
 */
-   if (!dev_priv->has_mob) {
+   if (!dev_priv->has_mob || !vmw_supports_3d(dev_priv)) {
int cpp = DIV_ROUND_UP(args->bpp, 8);
 
switch (cpp) {
-- 
2.43.0



[PATCH 1/3] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-16 Thread Zack Rusin
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.

Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.

Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.19+
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t 
size)
void *virtual;
int ret;
 
+   atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, 
size_t size)
  */
 void vmw_bo_unmap(struct vmw_bo *vbo)
 {
+   int map_count;
+
if (vbo->map.bo == NULL)
return;
 
-   ttm_bo_kunmap(&vbo->map);
-   vbo->map.bo = NULL;
+   map_count = atomic_dec_return(&vbo->map_count);
+
+   if (!map_count) {
+   ttm_bo_kunmap(&vbo->map);
+   vbo->map.bo = NULL;
+   }
 }
 
 
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+   atomic_set(&vmw_bo->map_count, 0);
 
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
  * @map: Kmap object for semi-persistent mappings
  * @res_tree: RB tree of resources using this buffer object as a backing MOB
  * @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
  * @cpu_writers: Number of synccpu write grabs. Protected by reservation when
  * increased. May be decreased without reservation.
  * @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
 
+   atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted.  Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
-- 
2.43.0



[PATCH 2/3] drm/vmwgfx: Fix prime with external buffers

2024-08-16 Thread Zack Rusin
Make sure that for external buffers mapping goes through the dma_buf
interface instead of trying to access pages directly.

External buffers might not provide direct access to readable/writable
pages so to make sure the bo's created from external dma_bufs can be
read dma_buf interface has to be used.

Fixes crashes in IGT's kms_prime with vgem. Regular desktop usage won't
trigger this due to the fact that virtual machines will not have
multiple GPUs but it enables better test coverage in IGT.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc:  # v6.6+
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 114 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  12 +--
 3 files changed, 118 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index 717d624e9a05..890a66a2361f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -27,6 +27,8 @@
  **/
 
 #include "vmwgfx_drv.h"
+
+#include "vmwgfx_bo.h"
 #include 
 
 /*
@@ -420,13 +422,105 @@ static int vmw_bo_cpu_blit_line(struct 
vmw_bo_blit_line_data *d,
return 0;
 }
 
+static void *map_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   struct vmw_private *vmw =
+   container_of(bo->tbo.bdev, struct vmw_private, bdev);
+   void *ptr = NULL;
+   int ret;
+
+   if (bo->tbo.base.import_attach) {
+   ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
+   if (ret) {
+   drm_dbg_driver(&vmw->drm,
+  "Wasn't able to map external bo!\n");
+   goto out;
+   }
+   ptr = map->vaddr;
+   } else {
+   ptr = vmw_bo_map_and_cache(bo);
+   }
+
+out:
+   return ptr;
+}
+
+static void unmap_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   if (bo->tbo.base.import_attach)
+   dma_buf_vunmap(bo->tbo.base.dma_buf, map);
+   else
+   vmw_bo_unmap(bo);
+}
+
+static int vmw_external_bo_copy(struct vmw_bo *dst, u32 dst_offset,
+   u32 dst_stride, struct vmw_bo *src,
+   u32 src_offset, u32 src_stride,
+   u32 width_in_bytes, u32 height,
+   struct vmw_diff_cpy *diff)
+{
+   struct vmw_private *vmw =
+   container_of(dst->tbo.bdev, struct vmw_private, bdev);
+   size_t dst_size = dst->tbo.resource->size;
+   size_t src_size = src->tbo.resource->size;
+   struct iosys_map dst_map = {0};
+   struct iosys_map src_map = {0};
+   int ret, i;
+   int x_in_bytes;
+   u8 *vsrc;
+   u8 *vdst;
+
+   vsrc = map_external(src, &src_map);
+   if (!vsrc) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map src\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vdst = map_external(dst, &dst_map);
+   if (!vdst) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map dst\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vsrc += src_offset;
+   vdst += dst_offset;
+   if (src_stride == dst_stride) {
+   dst_size -= dst_offset;
+   src_size -= src_offset;
+   memcpy(vdst, vsrc,
+  min(dst_stride * height, min(dst_size, src_size)));
+   } else {
+   WARN_ON(dst_stride < width_in_bytes);
+   for (i = 0; i < height; ++i) {
+   memcpy(vdst, vsrc, width_in_bytes);
+   vsrc += src_stride;
+   vdst += dst_stride;
+   }
+   }
+
+   x_in_bytes = (dst_offset % dst_stride);
+   diff->rect.x1 =  x_in_bytes / diff->cpp;
+   diff->rect.y1 = ((dst_offset - x_in_bytes) / dst_stride);
+   diff->rect.x2 = diff->rect.x1 + width_in_bytes / diff->cpp;
+   diff->rect.y2 = diff->rect.y1 + height;
+
+   ret = 0;
+out:
+   unmap_external(src, &src_map);
+   unmap_external(dst, &dst_map);
+
+   return ret;
+}
+
 /**
  * vmw_bo_cpu_blit - in-kernel cpu blit.
  *
- * @dst: Destination buffer object.
+ * @vmw_dst: Destination buffer object.
  * @dst_offset: Destination offset of blit start in bytes.
  * @dst_stride: Destination stride in bytes.
- * @src: Source buffer object.
+ * @vmw_src: Source buffer object.
  * @src_offset: Source offset of blit start in bytes.
  * @src_stride: Source stride in bytes.
  *

[PATCH 0/3] Various prime/dumb buffer fixes

2024-08-16 Thread Zack Rusin
This is the same series I've sent out earlier but with one extra patch,
that fixes the dumb buffer coherency on low mem systems.

The second patch has also been updated to not use math functions.

Zack Rusin (3):
  drm/vmwgfx: Prevent unmapping active read buffers
  drm/vmwgfx: Fix prime with external buffers
  drm/vmwgfx: Disable coherent dumb buffers without 3d

 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c| 114 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c  |  13 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h  |   3 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c|  12 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c |   6 +-
 6 files changed, 136 insertions(+), 16 deletions(-)

-- 
2.43.0



Re: [REGRESSION][BISECTED] vmwgfx crashes with command buffer error after update

2024-08-15 Thread Zack Rusin
On Thu, Aug 15, 2024 at 1:48 PM Christian Heusel  wrote:
>
> Hello Zack,
>
> the user rdkehn (in CC) on the Arch Linux Forums reports that after
> updating to the 6.10.4 stable kernel inside of their VM Workstation the
> driver crashes with the error attached below. This error is also present
> on the latest mainline release 6.11-rc3.
>
> We have bisected the issue together down to the following commit:
>
> d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
>
> Reverting this commit on top of 6.11-rc3 fixes the issue.
>
> While we were still debugging the issue Brad (also CC'ed) messaged me
> that they were seeing similar failures in their ESXi based test
> pipelines except for one box that was running on legacy BIOS (so maybe
> that is relevant). They noticed this because they had set panic_on_warn.
>
> Cheers,
> Chris
>
> ---
>
> #regzbot introduced: d6667f0ddf46
> #regzbot title: drm/vmwgfx: driver crashes due to command buffer error
> #regzbot link: https://bbs.archlinux.org/viewtopic.php?id=298491
>
> ---
>
> dmesg snippet:
> [   13.297084] [ cut here ]
> [   13.297086] Command buffer error.
> [   13.297139] WARNING: CPU: 0 PID: 186 at 
> drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c:399 vmw_cmdbuf_ctx_process+0x268/0x270 
> [vmwgfx]
> [   13.297160] Modules linked in: uas usb_storage hid_generic usbhid mptspi 
> sr_mod cdrom scsi_transport_spi vmwgfx serio_raw mptscsih ata_generic atkbd 
> drm_ttm_helper libps2 pata_acpi vivaldi_fmap ttm mptbase crc32c_intel 
> xhci_pci intel_agp xhci_pci_renesas ata_piix intel_gtt i8042 serio
> [   13.297172] CPU: 0 PID: 186 Comm: irq/16-vmwgfx Not tainted 6.10.4-arch2-1 
> #1 517ed45cc9c4492ee5d5bfc2d2fe6ef1f2e7a8eb
> [   13.297174] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
> Desktop Reference Platform, BIOS 6.00 11/12/2020
> [   13.297175] RIP: 0010:vmw_cmdbuf_ctx_process+0x268/0x270 [vmwgfx]
> [   13.297186] Code: 01 00 01 e8 ba 8c 4f f9 0f 0b 4c 89 ff e8 40 fb ff ff e9 
> 9d fe ff ff 48 c7 c7 99 d9 3f c0 c6 05 52 2f 01 00 01 e8 98 8c 4f f9 <0f> 0b 
> e9 1f fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [   13.297187] RSP: 0018:b9c1805e3d78 EFLAGS: 00010282
> [   13.297188] RAX:  RBX: 0003 RCX: 
> 0003
> [   13.297189] RDX:  RSI: 0003 RDI: 
> 0001
> [   13.297190] RBP: 907fc8274c98 R08:  R09: 
> b9c1805e3bf8
> [   13.297191] R10: 9086dbdfffa8 R11: 0003 R12: 
> 907fc4db5b00
> [   13.297192] R13: 907fc83fd318 R14: 907fc8274c88 R15: 
> 907fc83fd300
> [   13.297193] FS:  () GS:9086dbe0() 
> knlGS:
> [   13.297194] CS:  0010 DS:  ES:  CR0: 80050033
> [   13.297194] CR2: 774dc57671ca CR3: 0006b9e20005 CR4: 
> 003706f0
> [   13.297196] Call Trace:
> [   13.297198]  
> [   13.297199]  ? vmw_cmdbuf_ctx_process+0x268/0x270 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297209]  ? __warn.cold+0x8e/0xe8
> [   13.297211]  ? vmw_cmdbuf_ctx_process+0x268/0x270 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297221]  ? report_bug+0xff/0x140
> [   13.297222]  ? console_unlock+0x84/0x130
> [   13.297225]  ? handle_bug+0x3c/0x80
> [   13.297226]  ? exc_invalid_op+0x17/0x70
> [   13.297227]  ? asm_exc_invalid_op+0x1a/0x20
> [   13.297230]  ? vmw_cmdbuf_ctx_process+0x268/0x270 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297238]  ? vmw_cmdbuf_ctx_process+0x268/0x270 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297245]  vmw_cmdbuf_man_process+0x5d/0x100 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297253]  vmw_cmdbuf_irqthread+0x25/0x30 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297261]  vmw_thread_fn+0x3a/0x70 [vmwgfx 
> a4fe13044bca4eda782d964fb8c4ca15afb325e9]
> [   13.297271]  irq_thread_fn+0x20/0x60
> [   13.297273]  irq_thread+0x18a/0x270
> [   13.297274]  ? __pfx_irq_thread_fn+0x10/0x10
> [   13.297276]  ? __pfx_irq_thread_dtor+0x10/0x10
> [   13.297277]  ? __pfx_irq_thread+0x10/0x10
> [   13.297278]  kthread+0xcf/0x100
> [   13.297281]  ? __pfx_kthread+0x10/0x10
> [   13.297282]  ret_from_fork+0x31/0x50
> [   13.297285]  ? __pfx_kthread+0x10/0x10
> [   13.297286]  ret_from_fork_asm+0x1a/0x30
> [   13.297288]  
> [   13.297289] ---[ end trace  ]---

Hi, Christian.

Thanks for the report! So just to be clear vmwgfx doesn't crash, but
it shows a warning and the kernel has been compiled with panic on
warning which is actually what panics, right?

I haven't seen this on any of our systems so I'm guessing the affected
systems aren't running gnome/kde? Is there any chance I could see the
full "journalctl -b" log and the vmware.log file associated with those
warnings? They could give me some clues on how to reproduce this.

z


[PATCH v3 2/2] drm/vmwgfx: Fix prime with external buffers

2024-08-15 Thread Zack Rusin
Make sure that for external buffers mapping goes through the dma_buf
interface instead of trying to access pages directly.

External buffers might not provide direct access to readable/writable
pages so to make sure the bo's created from external dma_bufs can be
read dma_buf interface has to be used.

Fixes crashes in IGT's kms_prime with vgem. Regular desktop usage won't
trigger this due to the fact that virtual machines will not have
multiple GPUs but it enables better test coverage in IGT.

v2: Fix the diff rectangle computation

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc:  # v6.6+
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 112 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  12 +--
 3 files changed, 116 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index 717d624e9a05..4049447d211c 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -27,6 +27,8 @@
  **/
 
 #include "vmwgfx_drv.h"
+
+#include "vmwgfx_bo.h"
 #include 
 
 /*
@@ -420,13 +422,103 @@ static int vmw_bo_cpu_blit_line(struct 
vmw_bo_blit_line_data *d,
return 0;
 }
 
+static void *map_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   struct vmw_private *vmw =
+   container_of(bo->tbo.bdev, struct vmw_private, bdev);
+   void *ptr = NULL;
+   int ret;
+
+   if (bo->tbo.base.import_attach) {
+   ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
+   if (ret) {
+   drm_dbg_driver(&vmw->drm,
+  "Wasn't able to map external bo!\n");
+   goto out;
+   }
+   ptr = map->vaddr;
+   } else {
+   ptr = vmw_bo_map_and_cache(bo);
+   }
+
+out:
+   return ptr;
+}
+
+static void unmap_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   if (bo->tbo.base.import_attach)
+   dma_buf_vunmap(bo->tbo.base.dma_buf, map);
+   else
+   vmw_bo_unmap(bo);
+}
+
+static int vmw_external_bo_copy(struct vmw_bo *dst, u32 dst_offset,
+   u32 dst_stride, struct vmw_bo *src,
+   u32 src_offset, u32 src_stride,
+   u32 width_in_bytes, u32 height,
+   struct vmw_diff_cpy *diff)
+{
+   struct vmw_private *vmw =
+   container_of(dst->tbo.bdev, struct vmw_private, bdev);
+   size_t dst_size = dst->tbo.resource->size;
+   size_t src_size = src->tbo.resource->size;
+   struct iosys_map dst_map = {0};
+   struct iosys_map src_map = {0};
+   int ret, i;
+   u8 *vsrc;
+   u8 *vdst;
+
+   vsrc = map_external(src, &src_map);
+   if (!vsrc) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map src\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vdst = map_external(dst, &dst_map);
+   if (!vdst) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map dst\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vsrc += src_offset;
+   vdst += dst_offset;
+   if (src_stride == dst_stride) {
+   dst_size -= dst_offset;
+   src_size -= src_offset;
+   memcpy(vdst, vsrc,
+  min(dst_stride * height, min(dst_size, src_size)));
+   } else {
+   WARN_ON(dst_stride < width_in_bytes);
+   for (i = 0; i < height; ++i) {
+   memcpy(vdst, vsrc, width_in_bytes);
+   vsrc += src_stride;
+   vdst += dst_stride;
+   }
+   }
+
+   diff->rect.x1 = (dst_offset % dst_stride) / diff->cpp;
+   diff->rect.y1 = floor(dst_offset / dst_stride);
+   diff->rect.x2 = diff->rect.x1 + width_in_bytes / diff->cpp;
+   diff->rect.y2 = diff->rect.y1 + height;
+
+   ret = 0;
+out:
+   unmap_external(src, &src_map);
+   unmap_external(dst, &dst_map);
+
+   return ret;
+}
+
 /**
  * vmw_bo_cpu_blit - in-kernel cpu blit.
  *
- * @dst: Destination buffer object.
+ * @vmw_dst: Destination buffer object.
  * @dst_offset: Destination offset of blit start in bytes.
  * @dst_stride: Destination stride in bytes.
- * @src: Source buffer object.
+ * @vmw_src: Source buffer object.
  * @src_offset: Source offset of blit start in bytes.
  * @src_stride: Source stride in bytes.
  * @w: Width of blit.
@@ -444,

[PATCH v3 1/2] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-15 Thread Zack Rusin
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.

Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.

v2: Update doc strings

Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.19+
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t 
size)
void *virtual;
int ret;
 
+   atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, 
size_t size)
  */
 void vmw_bo_unmap(struct vmw_bo *vbo)
 {
+   int map_count;
+
if (vbo->map.bo == NULL)
return;
 
-   ttm_bo_kunmap(&vbo->map);
-   vbo->map.bo = NULL;
+   map_count = atomic_dec_return(&vbo->map_count);
+
+   if (!map_count) {
+   ttm_bo_kunmap(&vbo->map);
+   vbo->map.bo = NULL;
+   }
 }
 
 
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+   atomic_set(&vmw_bo->map_count, 0);
 
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
  * @map: Kmap object for semi-persistent mappings
  * @res_tree: RB tree of resources using this buffer object as a backing MOB
  * @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
  * @cpu_writers: Number of synccpu write grabs. Protected by reservation when
  * increased. May be decreased without reservation.
  * @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
 
+   atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted.  Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
-- 
2.43.0



[PATCH v2 2/2] drm/vmwgfx: Fix prime with external buffers

2024-08-14 Thread Zack Rusin
Make sure that for external buffers mapping goes through the dma_buf
interface instead of trying to access pages directly.

External buffers might not provide direct access to readable/writable
pages so to make sure the bo's created from external dma_bufs can be
read dma_buf interface has to be used.

Fixes crashes in IGT's kms_prime with vgem. Regular desktop usage won't
trigger this due to the fact that virtual machines will not have
multiple GPUs but it enables better test coverage in IGT.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc:  # v6.6+
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 112 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  12 +--
 3 files changed, 116 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index 717d624e9a05..3140414d027e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -27,6 +27,8 @@
  **/
 
 #include "vmwgfx_drv.h"
+
+#include "vmwgfx_bo.h"
 #include 
 
 /*
@@ -420,13 +422,103 @@ static int vmw_bo_cpu_blit_line(struct 
vmw_bo_blit_line_data *d,
return 0;
 }
 
+static void *map_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   struct vmw_private *vmw =
+   container_of(bo->tbo.bdev, struct vmw_private, bdev);
+   void *ptr = NULL;
+   int ret;
+
+   if (bo->tbo.base.import_attach) {
+   ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
+   if (ret) {
+   drm_dbg_driver(&vmw->drm,
+  "Wasn't able to map external bo!\n");
+   goto out;
+   }
+   ptr = map->vaddr;
+   } else {
+   ptr = vmw_bo_map_and_cache(bo);
+   }
+
+out:
+   return ptr;
+}
+
+static void unmap_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   if (bo->tbo.base.import_attach)
+   dma_buf_vunmap(bo->tbo.base.dma_buf, map);
+   else
+   vmw_bo_unmap(bo);
+}
+
+static int vmw_external_bo_copy(struct vmw_bo *dst, u32 dst_offset,
+   u32 dst_stride, struct vmw_bo *src,
+   u32 src_offset, u32 src_stride,
+   u32 width_in_bytes, u32 height,
+   struct vmw_diff_cpy *diff)
+{
+   struct vmw_private *vmw =
+   container_of(dst->tbo.bdev, struct vmw_private, bdev);
+   size_t dst_size = dst->tbo.resource->size;
+   size_t src_size = src->tbo.resource->size;
+   struct iosys_map dst_map = {0};
+   struct iosys_map src_map = {0};
+   int ret, i;
+   u8 *vsrc;
+   u8 *vdst;
+
+   vsrc = map_external(src, &src_map);
+   if (!vsrc) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map src\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vdst = map_external(dst, &dst_map);
+   if (!vdst) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map dst\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vsrc += src_offset;
+   vdst += dst_offset;
+   if (src_stride == dst_stride) {
+   dst_size -= dst_offset;
+   src_size -= src_offset;
+   memcpy(vdst, vsrc,
+  min(dst_stride * height, min(dst_size, src_size)));
+   } else {
+   WARN_ON(dst_stride < width_in_bytes);
+   for (i = 0; i < height; ++i) {
+   memcpy(vdst, vsrc, width_in_bytes);
+   vsrc += src_stride;
+   vdst += dst_stride;
+   }
+   }
+
+   diff->rect.y1 = dst_offset % dst_stride;
+   diff->rect.x1 = (dst_offset - dst_offset * diff->rect.y1) / diff->cpp;
+   diff->rect.x2 = diff->rect.x1 + width_in_bytes / diff->cpp;
+   diff->rect.y2 = diff->rect.y1 + height;
+
+   ret = 0;
+out:
+   unmap_external(src, &src_map);
+   unmap_external(dst, &dst_map);
+
+   return ret;
+}
+
 /**
  * vmw_bo_cpu_blit - in-kernel cpu blit.
  *
- * @dst: Destination buffer object.
+ * @vmw_dst: Destination buffer object.
  * @dst_offset: Destination offset of blit start in bytes.
  * @dst_stride: Destination stride in bytes.
- * @src: Source buffer object.
+ * @vmw_src: Source buffer object.
  * @src_offset: Source offset of blit start in bytes.
  * @src_stride: Source stride in bytes.
  * @w: Width of blit.
@@ -444,13 +536,15 @@ static i

[PATCH v2 1/2] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-14 Thread Zack Rusin
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.

Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.

v2: Update doc strings

Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.19+
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t 
size)
void *virtual;
int ret;
 
+   atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, 
size_t size)
  */
 void vmw_bo_unmap(struct vmw_bo *vbo)
 {
+   int map_count;
+
if (vbo->map.bo == NULL)
return;
 
-   ttm_bo_kunmap(&vbo->map);
-   vbo->map.bo = NULL;
+   map_count = atomic_dec_return(&vbo->map_count);
+
+   if (!map_count) {
+   ttm_bo_kunmap(&vbo->map);
+   vbo->map.bo = NULL;
+   }
 }
 
 
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+   atomic_set(&vmw_bo->map_count, 0);
 
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
  * @map: Kmap object for semi-persistent mappings
  * @res_tree: RB tree of resources using this buffer object as a backing MOB
  * @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
  * @cpu_writers: Number of synccpu write grabs. Protected by reservation when
  * increased. May be decreased without reservation.
  * @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
 
+   atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted.  Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
-- 
2.43.0



Re: [PATCH 0/3] drm/vmwgfx: Add support for userspace managed surfaces.

2024-08-13 Thread Zack Rusin
On Mon, Aug 12, 2024 at 3:16 PM Maaz Mombasawala
 wrote:
>
> This series introduces basic support for userspace managed surfaces. The
> lifetime and id's of these surfaces is managed by userspace submitted
> commands instead of relying on the kernel to manage them.
>
> Maaz Mombasawala (3):
>   drm/vmwgfx: Introduce userspace managed surfaces
>   drm/vmwgfx: Support hw_destroy for userspace managed surfaces
>   drm/vmwgfx: Add support for older define commands for userspace
> surfaces
>
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |  24 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 331 ++--
>  drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 196 +-
>  3 files changed, 518 insertions(+), 33 deletions(-)
>

In general that looks great. Do you happen to have the userspace patch
somewhere where we could see it? In particular there are three things
I'm wondering about:
1) In the first patch you mark the gb surface as may_evict = false;
correctly, because if user space is the thing that attaches mob's then
kernel can not switch them underneath but then I'd like to see how are
the memory pressure situations handled on the user-side,
2) Since now we allow surface destroy commands from userspace could
one trigger some kernel oops when running old surface defines with
mob_create flag set and issuing the gb surface destroy or will the
res->id be reset properly?
3) how is userspace able to select whether it should self-manage the
mob's or let the kernel do it? i.e. what flag signifies that the
userspace is running on a kernel that is capable of handling this?

z


Re: [PATCH 1/2] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-13 Thread Zack Rusin
On Tue, Aug 13, 2024 at 1:56 PM Ian Forbes  wrote:
>
> In that case move `map_count` above `map` which should move it to a
> separate cache line and update the doc strings as needed.

Sorry, I'm not sure I understand. What are you trying to fix?

z


Re: [PATCH 1/2] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-13 Thread Zack Rusin
On Tue, Aug 13, 2024 at 1:29 PM Ian Forbes  wrote:
>
> Remove `busy_places` now that it's unused. There's also probably a
> better place to put `map_count` in the struct layout to avoid false
> sharing with `cpu_writers`. I'd repack the whole struct if we're going
> to be adding and removing fields.

Those are not related to this change. They'd be two seperate changes.
One to remove other unused members and third to relayout the struct.

z


Re: [PATCH] drm/vmwgfx: Handle possible ENOMEM in vmw_stdu_connector_atomic_check

2024-08-13 Thread Zack Rusin
On Fri, Aug 9, 2024 at 2:38 PM Ian Forbes  wrote:
>
> Handle unlikely ENOMEN condition and other errors in
> vmw_stdu_connector_atomic_check.
>
> Signed-off-by: Ian Forbes 
> Reported-by: Dan Carpenter 
> Fixes: 75c3e8a26a35 ("drm/vmwgfx: Trigger a modeset when the screen moves")
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> index 571e157fe22e9..3223fd278a598 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> @@ -886,6 +886,10 @@ static int vmw_stdu_connector_atomic_check(struct 
> drm_connector *conn,
> struct drm_crtc_state *new_crtc_state;
>
> conn_state = drm_atomic_get_connector_state(state, conn);
> +
> +   if (IS_ERR(conn_state))
> +   return PTR_ERR(conn_state);
> +
> du = vmw_connector_to_stdu(conn);
>
> if (!conn_state->crtc)
> --
> 2.34.1
>

Reviewed-by: Zack Rusin 

z


Re: [PATCH] drm/vmwgfx: Limit display layout ioctl array size to VMWGFX_NUM_DISPLAY_UNITS

2024-08-13 Thread Zack Rusin
On Thu, Aug 8, 2024 at 4:06 PM Ian Forbes  wrote:
>
> Currently the array size is only limited by the largest kmalloc size which
> is incorrect. This change will also return a more specific error message
> than ENOMEM to userspace.
>
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 4 ++--
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 4 +++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 3 ---
>  3 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> index 32f50e595809..888349f2aac1 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> @@ -62,7 +62,7 @@
>  #define VMWGFX_DRIVER_MINOR 20
>  #define VMWGFX_DRIVER_PATCHLEVEL 0
>  #define VMWGFX_FIFO_STATIC_SIZE (1024*1024)
> -#define VMWGFX_MAX_DISPLAYS 16
> +#define VMWGFX_NUM_DISPLAY_UNITS 8
>  #define VMWGFX_CMD_BOUNCE_INIT_SIZE 32768
>
>  #define VMWGFX_MIN_INITIAL_WIDTH 1280
> @@ -82,7 +82,7 @@
>  #define VMWGFX_NUM_GB_CONTEXT 256
>  #define VMWGFX_NUM_GB_SHADER 2
>  #define VMWGFX_NUM_GB_SURFACE 32768
> -#define VMWGFX_NUM_GB_SCREEN_TARGET VMWGFX_MAX_DISPLAYS
> +#define VMWGFX_NUM_GB_SCREEN_TARGET VMWGFX_NUM_DISPLAY_UNITS
>  #define VMWGFX_NUM_DXCONTEXT 256
>  #define VMWGFX_NUM_DXQUERY 512
>  #define VMWGFX_NUM_MOB (VMWGFX_NUM_GB_CONTEXT +\
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 288ed0bb75cb..884804274dfb 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2225,7 +2225,7 @@ int vmw_kms_update_layout_ioctl(struct drm_device *dev, 
> void *data,
> struct drm_mode_config *mode_config = &dev->mode_config;
> struct drm_vmw_update_layout_arg *arg =
> (struct drm_vmw_update_layout_arg *)data;
> -   void __user *user_rects;
> +   const void __user *user_rects;
> struct drm_vmw_rect *rects;
> struct drm_rect *drm_rects;
> unsigned rects_size;
> @@ -2237,6 +2237,8 @@ int vmw_kms_update_layout_ioctl(struct drm_device *dev, 
> void *data,
> VMWGFX_MIN_INITIAL_HEIGHT};
> vmw_du_update_layout(dev_priv, 1, &def_rect);
> return 0;
> +   } else if (arg->num_outputs > VMWGFX_NUM_DISPLAY_UNITS) {
> +   return -E2BIG;
> }
>
> rects_size = arg->num_outputs * sizeof(struct drm_vmw_rect);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
> index 6141fadf81ef..2a6c6d6581e0 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
> @@ -199,9 +199,6 @@ struct vmw_kms_dirty {
> s32 unit_y2;
>  };
>
> -#define VMWGFX_NUM_DISPLAY_UNITS 8
> -
> -
>  #define vmw_framebuffer_to_vfb(x) \
> container_of(x, struct vmw_framebuffer, base)
>  #define vmw_framebuffer_to_vfbs(x) \
> --
> 2.34.1
>

Looks good. Thanks.

Reviewed-by: Zack Rusin 

z


[PATCH 2/2] drm/vmwgfx: Fix prime with external buffers

2024-08-01 Thread Zack Rusin
Make sure that for external buffers mapping goes through the dma_buf
interface instead of trying to access pages directly.

External buffers might not provide direct access to readable/writable
pages so to make sure the bo's created from external dma_bufs can be
read dma_buf interface has to be used.

Fixes crashes in IGT's kms_prime with vgem. Regular desktop usage won't
trigger this due to the fact that virtual machines will not have
multiple GPUs but it enables better test coverage in IGT.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc:  # v6.6+
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 112 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  12 +--
 3 files changed, 116 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index 717d624e9a05..3140414d027e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -27,6 +27,8 @@
  **/
 
 #include "vmwgfx_drv.h"
+
+#include "vmwgfx_bo.h"
 #include 
 
 /*
@@ -420,13 +422,103 @@ static int vmw_bo_cpu_blit_line(struct 
vmw_bo_blit_line_data *d,
return 0;
 }
 
+static void *map_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   struct vmw_private *vmw =
+   container_of(bo->tbo.bdev, struct vmw_private, bdev);
+   void *ptr = NULL;
+   int ret;
+
+   if (bo->tbo.base.import_attach) {
+   ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
+   if (ret) {
+   drm_dbg_driver(&vmw->drm,
+  "Wasn't able to map external bo!\n");
+   goto out;
+   }
+   ptr = map->vaddr;
+   } else {
+   ptr = vmw_bo_map_and_cache(bo);
+   }
+
+out:
+   return ptr;
+}
+
+static void unmap_external(struct vmw_bo *bo, struct iosys_map *map)
+{
+   if (bo->tbo.base.import_attach)
+   dma_buf_vunmap(bo->tbo.base.dma_buf, map);
+   else
+   vmw_bo_unmap(bo);
+}
+
+static int vmw_external_bo_copy(struct vmw_bo *dst, u32 dst_offset,
+   u32 dst_stride, struct vmw_bo *src,
+   u32 src_offset, u32 src_stride,
+   u32 width_in_bytes, u32 height,
+   struct vmw_diff_cpy *diff)
+{
+   struct vmw_private *vmw =
+   container_of(dst->tbo.bdev, struct vmw_private, bdev);
+   size_t dst_size = dst->tbo.resource->size;
+   size_t src_size = src->tbo.resource->size;
+   struct iosys_map dst_map = {0};
+   struct iosys_map src_map = {0};
+   int ret, i;
+   u8 *vsrc;
+   u8 *vdst;
+
+   vsrc = map_external(src, &src_map);
+   if (!vsrc) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map src\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vdst = map_external(dst, &dst_map);
+   if (!vdst) {
+   drm_dbg_driver(&vmw->drm, "Wasn't able to map dst\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   vsrc += src_offset;
+   vdst += dst_offset;
+   if (src_stride == dst_stride) {
+   dst_size -= dst_offset;
+   src_size -= src_offset;
+   memcpy(vdst, vsrc,
+  min(dst_stride * height, min(dst_size, src_size)));
+   } else {
+   WARN_ON(dst_stride < width_in_bytes);
+   for (i = 0; i < height; ++i) {
+   memcpy(vdst, vsrc, width_in_bytes);
+   vsrc += src_stride;
+   vdst += dst_stride;
+   }
+   }
+
+   diff->rect.y1 = dst_offset % dst_stride;
+   diff->rect.x1 = (dst_offset - dst_offset * diff->rect.y1) / diff->cpp;
+   diff->rect.x2 = diff->rect.x1 + width_in_bytes / diff->cpp;
+   diff->rect.y2 = diff->rect.y1 + height;
+
+   ret = 0;
+out:
+   unmap_external(src, &src_map);
+   unmap_external(dst, &dst_map);
+
+   return ret;
+}
+
 /**
  * vmw_bo_cpu_blit - in-kernel cpu blit.
  *
- * @dst: Destination buffer object.
+ * @vmw_dst: Destination buffer object.
  * @dst_offset: Destination offset of blit start in bytes.
  * @dst_stride: Destination stride in bytes.
- * @src: Source buffer object.
+ * @vmw_src: Source buffer object.
  * @src_offset: Source offset of blit start in bytes.
  * @src_stride: Source stride in bytes.
  * @w: Width of blit.
@@ -444,13 +536,15 @@ static i

[PATCH 1/2] drm/vmwgfx: Prevent unmapping active read buffers

2024-08-01 Thread Zack Rusin
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.

Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.

Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.19+
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t 
size)
void *virtual;
int ret;
 
+   atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, 
size_t size)
  */
 void vmw_bo_unmap(struct vmw_bo *vbo)
 {
+   int map_count;
+
if (vbo->map.bo == NULL)
return;
 
-   ttm_bo_kunmap(&vbo->map);
-   vbo->map.bo = NULL;
+   map_count = atomic_dec_return(&vbo->map_count);
+
+   if (!map_count) {
+   ttm_bo_kunmap(&vbo->map);
+   vbo->map.bo = NULL;
+   }
 }
 
 
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+   atomic_set(&vmw_bo->map_count, 0);
 
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..dc13f1e996c1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -90,6 +90,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
 
+   atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted.  Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
-- 
2.43.0



Re: [PATCH] drm/plane: Fix IS_ERR() vs NULL bug

2024-07-29 Thread Zack Rusin
On Sat, Jul 27, 2024 at 1:32 AM Dan Carpenter  wrote:
>
> The drm_property_create_signed_range() function returns NULL on error,
> it doesn't return error pointers.  Change the IS_ERR() tests to check
> for NULL.
>
> Fixes: 8f7179a1027d ("drm/atomic: Add support for mouse hotspots")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/drm_plane.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
> index a28b22fdd7a4..4fcb5d486de6 100644
> --- a/drivers/gpu/drm/drm_plane.c
> +++ b/drivers/gpu/drm/drm_plane.c
> @@ -328,14 +328,14 @@ static int drm_plane_create_hotspot_properties(struct 
> drm_plane *plane)
>
> prop_x = drm_property_create_signed_range(plane->dev, 0, "HOTSPOT_X",
>   INT_MIN, INT_MAX);
> -   if (IS_ERR(prop_x))
> -   return PTR_ERR(prop_x);
> +   if (!prop_x)
> +   return -ENOMEM;
>
> prop_y = drm_property_create_signed_range(plane->dev, 0, "HOTSPOT_Y",
>   INT_MIN, INT_MAX);
> -   if (IS_ERR(prop_y)) {
> +   if (!prop_y) {
> drm_property_destroy(plane->dev, prop_x);
> -   return PTR_ERR(prop_y);
> +   return -ENOMEM;
> }
>
> drm_object_attach_property(&plane->base, prop_x, 0);

Thanks, that looks good to me.

Reviewed-by: Zack Rusin 

z


Re: [PATCH] drm/vmwgfx: Fix overlay when using Screen Targets

2024-07-24 Thread Zack Rusin
On Fri, Jul 19, 2024 at 12:37 PM Ian Forbes  wrote:
>
> This code was never updated to support Screen Targets.
> Fixes a bug where Xv playback displays a green screen instead of actual
> video contents when 3D acceleration is disabled in the guest.
>
> Fixes: c8261a961ece ("vmwgfx: Major KMS refactoring / cleanup in preparation 
> of screen targets")
> Reported-by: Doug Brown 
> Closes: 
> https://lore.kernel.org/all/bd9cb3c7-90e8-435d-bc28-0e38fee58...@schmorgal.com
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
> index c45b4724e4141..e20f64b67b266 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
> @@ -92,7 +92,7 @@ static int vmw_overlay_send_put(struct vmw_private 
> *dev_priv,
>  {
> struct vmw_escape_video_flush *flush;
> size_t fifo_size;
> -   bool have_so = (dev_priv->active_display_unit == 
> vmw_du_screen_object);
> +   bool have_so = (dev_priv->active_display_unit != vmw_du_legacy);
> int i, num_items;
> SVGAGuestPtr ptr;

Thanks. I pushed it to drm-misc-fixes.

z


[PATCH] drm/vmwgfx: Bump the minor version of the driver

2024-07-22 Thread Zack Rusin
Provide a way to query for the fixed support for dumb buffers with kms.

Lets mesa svga return a buffer id, instead of a surface id from
resource_to_handle which fixes a lot of userspace apps that assume
that those handles are gem buffers.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 8de973549b5e..ced881fdca4a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -57,9 +57,9 @@
 
 
 #define VMWGFX_DRIVER_NAME "vmwgfx"
-#define VMWGFX_DRIVER_DATE "20211206"
+#define VMWGFX_DRIVER_DATE "20240722"
 #define VMWGFX_DRIVER_MAJOR 2
-#define VMWGFX_DRIVER_MINOR 20
+#define VMWGFX_DRIVER_MINOR 21
 #define VMWGFX_DRIVER_PATCHLEVEL 0
 #define VMWGFX_FIFO_STATIC_SIZE (1024*1024)
 #define VMWGFX_MAX_DISPLAYS 16
-- 
2.43.0



[PATCH v5 4/4] drm/vmwgfx: Add basic support for external buffers

2024-07-22 Thread Zack Rusin
Make vmwgfx go through the dma-buf interface to map/unmap imported
buffers. The driver used to try to directly manipulate external
buffers, assuming that everything that was coming to it had to live
in cpu accessible memory. While technically true because what's in the
vms is controlled by us, it's semantically completely broken.

Fix importing of external buffers by forwarding all memory access
requests to the importer.

Tested by the vmw_prime basic_vgem test.

Signed-off-by: Zack Rusin 
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 62 +++--
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index 07185c108218..b9857f37ca1a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /*
- * Copyright 2021-2023 VMware, Inc.
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -78,6 +79,59 @@ static struct sg_table *vmw_gem_object_get_sg_table(struct 
drm_gem_object *obj)
return drm_prime_pages_to_sg(obj->dev, vmw_tt->dma_ttm.pages, 
vmw_tt->dma_ttm.num_pages);
 }
 
+static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
+   int ret;
+
+   if (obj->import_attach) {
+   ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
+   if (!ret) {
+   if (drm_WARN_ON(obj->dev, map->is_iomem)) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   return -EIO;
+   }
+   }
+   } else {
+   ret = ttm_bo_vmap(bo, map);
+   }
+
+   return ret;
+}
+
+static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   if (obj->import_attach)
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   else
+   drm_gem_ttm_vunmap(obj, map);
+}
+
+static int vmw_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+   int ret;
+
+   if (obj->import_attach) {
+   /*
+* Reset both vm_ops and vm_private_data, so we don't end up 
with
+* vm_ops pointing to our implementation if the dma-buf backend
+* doesn't set those fields.
+*/
+   vma->vm_private_data = NULL;
+   vma->vm_ops = NULL;
+
+   ret = dma_buf_mmap(obj->dma_buf, vma, 0);
+
+   /* Drop the reference drm_gem_mmap_obj() acquired.*/
+   if (!ret)
+   drm_gem_object_put(obj);
+
+   return ret;
+   }
+
+   return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct vm_operations_struct vmw_vm_ops = {
.pfn_mkwrite = vmw_bo_vm_mkwrite,
.page_mkwrite = vmw_bo_vm_mkwrite,
@@ -94,9 +148,9 @@ static const struct drm_gem_object_funcs 
vmw_gem_object_funcs = {
.pin = vmw_gem_object_pin,
.unpin = vmw_gem_object_unpin,
.get_sg_table = vmw_gem_object_get_sg_table,
-   .vmap = drm_gem_ttm_vmap,
-   .vunmap = drm_gem_ttm_vunmap,
-   .mmap = drm_gem_ttm_mmap,
+   .vmap = vmw_gem_vmap,
+   .vunmap = vmw_gem_vunmap,
+   .mmap = vmw_gem_mmap,
.vm_ops = &vmw_vm_ops,
 };
 
-- 
2.43.0



[PATCH v5 3/4] drm/vmwgfx: Fix handling of dumb buffers

2024-07-22 Thread Zack Rusin
Dumb buffers can be used in kms but also through prime with gallium's
resource_from_handle. In the second case the dumb buffers can be
rendered by the GPU where with the regular DRM kms interfaces they
are mapped and written to by the CPU. Because the same buffer can
be written to by the GPU and CPU vmwgfx needs to use vmw_surface (object
which properly tracks dirty state of the guest and gpu memory)
instead of vmw_bo (which is just guest side memory).

Furthermore the dumb buffer handles are expected to be gem objects by
a lot of userspace.

Make vmwgfx accept gem handles in prime and kms but internally switch
to vmw_surface's to properly track the dirty state of the objects between
the GPU and CPU.

Fixes new kwin and kde on wayland.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 502 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 280 +++-
 12 files changed, 740 insertions(+), 502 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h 
b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
index b0d87c5f58d8..1ac3cb151b11 100644
--- a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
+++ b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
@@ -1,6 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
- * Copyright 2021 VMware, Inc.
- * SPDX-License-Identifier: GPL-2.0 OR MIT
+ *
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -31,6 +33,10 @@
 
 #include 
 
+#define SVGA3D_FLAGS_UPPER_32(svga3d_flags) ((svga3d_flags) >> 32)
+#define SVGA3D_FLAGS_LOWER_32(svga3d_flags) \
+   ((svga3d_flags) & ((uint64_t)U32_MAX))
+
 static inline u32 clamped_umul32(u32 a, u32 b)
 {
uint64_t tmp = (uint64_t) a*b;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 00144632c600..f42ebc4a7c22 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright © 2011-2023 VMware, Inc., Palo Alto, CA., USA
- * All Rights Reserved.
+ * Copyright (c) 2011-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -28,15 +28,39 @@
 
 #include "vmwgfx_bo.h"
 #include "vmwgfx_drv.h"
-
+#include "vmwgfx_resource_priv.h"
 
 #include 
 
 static void vmw_bo_release(struct vmw_bo *vbo)
 {
+   struct vmw_resource *res;
+
WARN_ON(vbo->tbo.base.funcs &&
kref_read(&vbo->tbo.base.refcount) != 0);
vmw_bo_unmap(vbo);
+
+   xa_destroy(&vbo->detached_resources);
+   WARN_ON(vbo->is_dumb && !vbo->dumb_surface);
+   if (vbo->is_dumb && vbo->dumb_surface) {
+   res = &vbo->dumb_surface->res;
+   WARN_ON(vbo != res->guest_memory_bo);
+   WARN_ON(!res->guest_memory_bo);
+   if (res->guest_memory_bo) {
+   /* Reserve and switch the backing mob. */
+   mutex_lock(&res->dev_priv->cmdbuf_mutex);
+   (void)vmw_resource_reserve(res, false, true);
+   vmw_resource_mob_detach(res);
+   if (res->coherent)
+   vmw_bo_dirty_release(res->guest_memory_bo);
+   res->guest_memory_bo = NULL;
+   res->guest_memory_offset = 0;
+   vmw_resource_unreserve(res, false, false, false, NULL,
+  0);
+   mutex_unlock(&res->dev_priv->cmdbuf_mutex);
+   }
+   vmw_surface_un

[PATCH v5 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-07-22 Thread Zack Rusin
Fix races issues in virtual crc generation by making sure the surface
the code uses for crc computation is properly ref counted.

Crc generation was trying to be too clever by allowing the surfaces
to go in and out of scope, with the hope of always having some kind
of screen present. That's not always the code, in particular during
atomic disable, so to make sure the surface, when present, is not
being actively destroyed at the same time, hold a reference to it.

Signed-off-by: Zack Rusin 
Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
index 3bfcf671fcd5..8651b788e98b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
@@ -75,7 +75,7 @@ vmw_surface_sync(struct vmw_private *vmw,
return ret;
 }
 
-static int
+static void
 compute_crc(struct drm_crtc *crtc,
struct vmw_surface *surf,
u32 *crc)
@@ -101,8 +101,6 @@ compute_crc(struct drm_crtc *crtc,
}
 
vmw_bo_unmap(bo);
-
-   return 0;
 }
 
 static void
@@ -116,7 +114,6 @@ crc_generate_worker(struct work_struct *work)
u64 frame_start, frame_end;
u32 crc32 = 0;
struct vmw_surface *surf = 0;
-   int ret;
 
spin_lock_irq(&du->vkms.crc_state_lock);
crc_pending = du->vkms.crc_pending;
@@ -130,22 +127,24 @@ crc_generate_worker(struct work_struct *work)
return;
 
spin_lock_irq(&du->vkms.crc_state_lock);
-   surf = du->vkms.surface;
+   surf = vmw_surface_reference(du->vkms.surface);
spin_unlock_irq(&du->vkms.crc_state_lock);
 
-   if (vmw_surface_sync(vmw, surf)) {
-   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
surface!\n");
-   return;
-   }
+   if (surf) {
+   if (vmw_surface_sync(vmw, surf)) {
+   drm_warn(
+   crtc->dev,
+   "CRC worker wasn't able to sync the crc 
surface!\n");
+   return;
+   }
 
-   ret = compute_crc(crtc, surf, &crc32);
-   if (ret)
-   return;
+   compute_crc(crtc, surf, &crc32);
+   vmw_surface_unreference(&surf);
+   }
 
spin_lock_irq(&du->vkms.crc_state_lock);
frame_start = du->vkms.frame_start;
frame_end = du->vkms.frame_end;
-   crc_pending = du->vkms.crc_pending;
du->vkms.frame_start = 0;
du->vkms.frame_end = 0;
du->vkms.crc_pending = false;
@@ -164,7 +163,7 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
struct vmw_display_unit *du = container_of(timer, struct 
vmw_display_unit, vkms.timer);
struct drm_crtc *crtc = &du->crtc;
struct vmw_private *vmw = vmw_priv(crtc->dev);
-   struct vmw_surface *surf = NULL;
+   bool has_surface = false;
u64 ret_overrun;
bool locked, ret;
 
@@ -179,10 +178,10 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
WARN_ON(!ret);
if (!locked)
return HRTIMER_RESTART;
-   surf = du->vkms.surface;
+   has_surface = du->vkms.surface != NULL;
vmw_vkms_unlock(crtc);
 
-   if (du->vkms.crc_enabled && surf) {
+   if (du->vkms.crc_enabled && has_surface) {
u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
spin_lock(&du->vkms.crc_state_lock);
@@ -336,6 +335,8 @@ vmw_vkms_crtc_cleanup(struct drm_crtc *crtc)
 {
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
 
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
WARN_ON(work_pending(&du->vkms.crc_generator_work));
hrtimer_cancel(&du->vkms.timer);
 }
@@ -497,9 +498,12 @@ vmw_vkms_set_crc_surface(struct drm_crtc *crtc,
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
struct vmw_private *vmw = vmw_priv(crtc->dev);
 
-   if (vmw->vkms_enabled) {
+   if (vmw->vkms_enabled && du->vkms.surface != surf) {
WARN_ON(atomic_read(&du->vkms.atomic_lock) != 
VMW_VKMS_LOCK_MODESET);
-   du->vkms.surface = surf;
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
+   if (surf)
+   du->vkms.surface = vmw_surface_reference(surf);
}
 }
 
-- 
2.43.0



[PATCH v5 1/4] drm/vmwgfx: Fix a deadlock in dma buf fence polling

2024-07-22 Thread Zack Rusin
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.

Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.

Signed-off-by: Zack Rusin 
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 5efc6a766f64..588d50ababf6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
 #define VMW_FENCE_WRAP (1 << 31)
 
 struct vmw_fence_manager {
-   int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -124,13 +123,13 @@ static void vmw_fence_obj_destroy(struct dma_fence *f)
 {
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
-
struct vmw_fence_manager *fman = fman_from_fence(fence);
 
-   spin_lock(&fman->lock);
-   list_del_init(&fence->head);
-   --fman->num_fence_objects;
-   spin_unlock(&fman->lock);
+   if (!list_empty(&fence->head)) {
+   spin_lock(&fman->lock);
+   list_del_init(&fence->head);
+   spin_unlock(&fman->lock);
+   }
fence->destroy(fence);
 }
 
@@ -257,7 +256,6 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
 };
 
-
 /*
  * Execute signal actions on fences recently signaled.
  * This is done from a workqueue so we don't have to execute
@@ -355,7 +353,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
-   ++fman->num_fence_objects;
 
 out_unlock:
spin_unlock(&fman->lock);
@@ -403,7 +400,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
  u32 passed_seqno)
 {
u32 goal_seqno;
-   struct vmw_fence_obj *fence;
+   struct vmw_fence_obj *fence, *next_fence;
 
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +410,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
return false;
 
fman->seqno_valid = false;
-   list_for_each_entry(fence, &fman->fence_list, head) {
+   list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_fence_goal_write(fman->dev_priv,
-- 
2.43.0



[PATCH v5 0/4] Fix various buffer mapping/import issues

2024-07-22 Thread Zack Rusin
This small series fixes all known prime/dumb_buffer/buffer dirty
tracking issues. Fixing of dumb-buffers turned out to be a lot more
complex than I wanted it to be. There's not much that can be done
there because the driver has to support old userspace (our Xorg driver
expects those to not be gem buffers and special cases a bunch of
functionality) and new userspace (which expects the handles to be
gem buffers, at least to issue GEM_CLOSE).

The third patch deals with it by making the objects returned from
dumb-buffers both (raw buffers and surfaces referenced by the same
handle), which always works and doesn't require any changes in userspace.

This fixes the known KDE (KWin's) buffer rendering issues.

v2: Fix compute_crc in the second patch, as spotted by Martin
v3: Simplify the first change which fixes the deadlock in the dma-buf
fence polling
v4: Fix mouse cursor races due to buffer mapping not being reserved in
the third patch
v5: Use map.virtual to check whether an user buffer is mapped instead
of map.bo. There's no change in functionality but we do use map.virtual
to check whether a buffer has been mapped elsewhere so usage of it
is more consistent. Spotted by Ian.

Zack Rusin (4):
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Add basic support for external buffers

 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c  |  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  62 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 502 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 280 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   |  40 +-
 15 files changed, 827 insertions(+), 534 deletions(-)

-- 
2.43.0



[PATCH v4 3/4] drm/vmwgfx: Fix handling of dumb buffers

2024-07-18 Thread Zack Rusin
Dumb buffers can be used in kms but also through prime with gallium's
resource_from_handle. In the second case the dumb buffers can be
rendered by the GPU where with the regular DRM kms interfaces they
are mapped and written to by the CPU. Because the same buffer can
be written to by the GPU and CPU vmwgfx needs to use vmw_surface (object
which properly tracks dirty state of the guest and gpu memory)
instead of vmw_bo (which is just guest side memory).

Furthermore the dumb buffer handles are expected to be gem objects by
a lot of userspace.

Make vmwgfx accept gem handles in prime and kms but internally switch
to vmw_surface's to properly track the dirty state of the objects between
the GPU and CPU.

Fixes new kwin and kde on wayland.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 499 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 280 +++-
 12 files changed, 737 insertions(+), 502 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h 
b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
index b0d87c5f58d8..1ac3cb151b11 100644
--- a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
+++ b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
@@ -1,6 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
- * Copyright 2021 VMware, Inc.
- * SPDX-License-Identifier: GPL-2.0 OR MIT
+ *
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -31,6 +33,10 @@
 
 #include 
 
+#define SVGA3D_FLAGS_UPPER_32(svga3d_flags) ((svga3d_flags) >> 32)
+#define SVGA3D_FLAGS_LOWER_32(svga3d_flags) \
+   ((svga3d_flags) & ((uint64_t)U32_MAX))
+
 static inline u32 clamped_umul32(u32 a, u32 b)
 {
uint64_t tmp = (uint64_t) a*b;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 00144632c600..f42ebc4a7c22 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright © 2011-2023 VMware, Inc., Palo Alto, CA., USA
- * All Rights Reserved.
+ * Copyright (c) 2011-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -28,15 +28,39 @@
 
 #include "vmwgfx_bo.h"
 #include "vmwgfx_drv.h"
-
+#include "vmwgfx_resource_priv.h"
 
 #include 
 
 static void vmw_bo_release(struct vmw_bo *vbo)
 {
+   struct vmw_resource *res;
+
WARN_ON(vbo->tbo.base.funcs &&
kref_read(&vbo->tbo.base.refcount) != 0);
vmw_bo_unmap(vbo);
+
+   xa_destroy(&vbo->detached_resources);
+   WARN_ON(vbo->is_dumb && !vbo->dumb_surface);
+   if (vbo->is_dumb && vbo->dumb_surface) {
+   res = &vbo->dumb_surface->res;
+   WARN_ON(vbo != res->guest_memory_bo);
+   WARN_ON(!res->guest_memory_bo);
+   if (res->guest_memory_bo) {
+   /* Reserve and switch the backing mob. */
+   mutex_lock(&res->dev_priv->cmdbuf_mutex);
+   (void)vmw_resource_reserve(res, false, true);
+   vmw_resource_mob_detach(res);
+   if (res->coherent)
+   vmw_bo_dirty_release(res->guest_memory_bo);
+   res->guest_memory_bo = NULL;
+   res->guest_memory_offset = 0;
+   vmw_resource_unreserve(res, false, false, false, NULL,
+  0);
+   mutex_unlock(&res->dev_priv->cmdbuf_mutex);
+   }
+   vmw_surface_un

[PATCH v4 4/4] drm/vmwgfx: Add basic support for external buffers

2024-07-18 Thread Zack Rusin
Make vmwgfx go through the dma-buf interface to map/unmap imported
buffers. The driver used to try to directly manipulate external
buffers, assuming that everything that was coming to it had to live
in cpu accessible memory. While technically true because what's in the
vms is controlled by us, it's semantically completely broken.

Fix importing of external buffers by forwarding all memory access
requests to the importer.

Tested by the vmw_prime basic_vgem test.

Signed-off-by: Zack Rusin 
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 62 +++--
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index 07185c108218..b9857f37ca1a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /*
- * Copyright 2021-2023 VMware, Inc.
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -78,6 +79,59 @@ static struct sg_table *vmw_gem_object_get_sg_table(struct 
drm_gem_object *obj)
return drm_prime_pages_to_sg(obj->dev, vmw_tt->dma_ttm.pages, 
vmw_tt->dma_ttm.num_pages);
 }
 
+static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
+   int ret;
+
+   if (obj->import_attach) {
+   ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
+   if (!ret) {
+   if (drm_WARN_ON(obj->dev, map->is_iomem)) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   return -EIO;
+   }
+   }
+   } else {
+   ret = ttm_bo_vmap(bo, map);
+   }
+
+   return ret;
+}
+
+static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   if (obj->import_attach)
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   else
+   drm_gem_ttm_vunmap(obj, map);
+}
+
+static int vmw_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+   int ret;
+
+   if (obj->import_attach) {
+   /*
+* Reset both vm_ops and vm_private_data, so we don't end up 
with
+* vm_ops pointing to our implementation if the dma-buf backend
+* doesn't set those fields.
+*/
+   vma->vm_private_data = NULL;
+   vma->vm_ops = NULL;
+
+   ret = dma_buf_mmap(obj->dma_buf, vma, 0);
+
+   /* Drop the reference drm_gem_mmap_obj() acquired.*/
+   if (!ret)
+   drm_gem_object_put(obj);
+
+   return ret;
+   }
+
+   return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct vm_operations_struct vmw_vm_ops = {
.pfn_mkwrite = vmw_bo_vm_mkwrite,
.page_mkwrite = vmw_bo_vm_mkwrite,
@@ -94,9 +148,9 @@ static const struct drm_gem_object_funcs 
vmw_gem_object_funcs = {
.pin = vmw_gem_object_pin,
.unpin = vmw_gem_object_unpin,
.get_sg_table = vmw_gem_object_get_sg_table,
-   .vmap = drm_gem_ttm_vmap,
-   .vunmap = drm_gem_ttm_vunmap,
-   .mmap = drm_gem_ttm_mmap,
+   .vmap = vmw_gem_vmap,
+   .vunmap = vmw_gem_vunmap,
+   .mmap = vmw_gem_mmap,
.vm_ops = &vmw_vm_ops,
 };
 
-- 
2.43.0



[PATCH v4 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-07-18 Thread Zack Rusin
Fix races issues in virtual crc generation by making sure the surface
the code uses for crc computation is properly ref counted.

Crc generation was trying to be too clever by allowing the surfaces
to go in and out of scope, with the hope of always having some kind
of screen present. That's not always the code, in particular during
atomic disable, so to make sure the surface, when present, is not
being actively destroyed at the same time, hold a reference to it.

Signed-off-by: Zack Rusin 
Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
index 3bfcf671fcd5..8651b788e98b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
@@ -75,7 +75,7 @@ vmw_surface_sync(struct vmw_private *vmw,
return ret;
 }
 
-static int
+static void
 compute_crc(struct drm_crtc *crtc,
struct vmw_surface *surf,
u32 *crc)
@@ -101,8 +101,6 @@ compute_crc(struct drm_crtc *crtc,
}
 
vmw_bo_unmap(bo);
-
-   return 0;
 }
 
 static void
@@ -116,7 +114,6 @@ crc_generate_worker(struct work_struct *work)
u64 frame_start, frame_end;
u32 crc32 = 0;
struct vmw_surface *surf = 0;
-   int ret;
 
spin_lock_irq(&du->vkms.crc_state_lock);
crc_pending = du->vkms.crc_pending;
@@ -130,22 +127,24 @@ crc_generate_worker(struct work_struct *work)
return;
 
spin_lock_irq(&du->vkms.crc_state_lock);
-   surf = du->vkms.surface;
+   surf = vmw_surface_reference(du->vkms.surface);
spin_unlock_irq(&du->vkms.crc_state_lock);
 
-   if (vmw_surface_sync(vmw, surf)) {
-   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
surface!\n");
-   return;
-   }
+   if (surf) {
+   if (vmw_surface_sync(vmw, surf)) {
+   drm_warn(
+   crtc->dev,
+   "CRC worker wasn't able to sync the crc 
surface!\n");
+   return;
+   }
 
-   ret = compute_crc(crtc, surf, &crc32);
-   if (ret)
-   return;
+   compute_crc(crtc, surf, &crc32);
+   vmw_surface_unreference(&surf);
+   }
 
spin_lock_irq(&du->vkms.crc_state_lock);
frame_start = du->vkms.frame_start;
frame_end = du->vkms.frame_end;
-   crc_pending = du->vkms.crc_pending;
du->vkms.frame_start = 0;
du->vkms.frame_end = 0;
du->vkms.crc_pending = false;
@@ -164,7 +163,7 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
struct vmw_display_unit *du = container_of(timer, struct 
vmw_display_unit, vkms.timer);
struct drm_crtc *crtc = &du->crtc;
struct vmw_private *vmw = vmw_priv(crtc->dev);
-   struct vmw_surface *surf = NULL;
+   bool has_surface = false;
u64 ret_overrun;
bool locked, ret;
 
@@ -179,10 +178,10 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
WARN_ON(!ret);
if (!locked)
return HRTIMER_RESTART;
-   surf = du->vkms.surface;
+   has_surface = du->vkms.surface != NULL;
vmw_vkms_unlock(crtc);
 
-   if (du->vkms.crc_enabled && surf) {
+   if (du->vkms.crc_enabled && has_surface) {
u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
spin_lock(&du->vkms.crc_state_lock);
@@ -336,6 +335,8 @@ vmw_vkms_crtc_cleanup(struct drm_crtc *crtc)
 {
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
 
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
WARN_ON(work_pending(&du->vkms.crc_generator_work));
hrtimer_cancel(&du->vkms.timer);
 }
@@ -497,9 +498,12 @@ vmw_vkms_set_crc_surface(struct drm_crtc *crtc,
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
struct vmw_private *vmw = vmw_priv(crtc->dev);
 
-   if (vmw->vkms_enabled) {
+   if (vmw->vkms_enabled && du->vkms.surface != surf) {
WARN_ON(atomic_read(&du->vkms.atomic_lock) != 
VMW_VKMS_LOCK_MODESET);
-   du->vkms.surface = surf;
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
+   if (surf)
+   du->vkms.surface = vmw_surface_reference(surf);
}
 }
 
-- 
2.43.0



[PATCH v4 1/4] drm/vmwgfx: Fix a deadlock in dma buf fence polling

2024-07-18 Thread Zack Rusin
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.

Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.

Signed-off-by: Zack Rusin 
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 5efc6a766f64..588d50ababf6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
 #define VMW_FENCE_WRAP (1 << 31)
 
 struct vmw_fence_manager {
-   int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -124,13 +123,13 @@ static void vmw_fence_obj_destroy(struct dma_fence *f)
 {
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
-
struct vmw_fence_manager *fman = fman_from_fence(fence);
 
-   spin_lock(&fman->lock);
-   list_del_init(&fence->head);
-   --fman->num_fence_objects;
-   spin_unlock(&fman->lock);
+   if (!list_empty(&fence->head)) {
+   spin_lock(&fman->lock);
+   list_del_init(&fence->head);
+   spin_unlock(&fman->lock);
+   }
fence->destroy(fence);
 }
 
@@ -257,7 +256,6 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
 };
 
-
 /*
  * Execute signal actions on fences recently signaled.
  * This is done from a workqueue so we don't have to execute
@@ -355,7 +353,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
-   ++fman->num_fence_objects;
 
 out_unlock:
spin_unlock(&fman->lock);
@@ -403,7 +400,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
  u32 passed_seqno)
 {
u32 goal_seqno;
-   struct vmw_fence_obj *fence;
+   struct vmw_fence_obj *fence, *next_fence;
 
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +410,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
return false;
 
fman->seqno_valid = false;
-   list_for_each_entry(fence, &fman->fence_list, head) {
+   list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_fence_goal_write(fman->dev_priv,
-- 
2.43.0



[PATCH v4 0/4] Fix various buffer mapping/import issues

2024-07-18 Thread Zack Rusin
This small series fixes all known prime/dumb_buffer/buffer dirty
tracking issues. Fixing of dumb-buffers turned out to be a lot more
complex than I wanted it to be. There's not much that can be done
there because the driver has to support old userspace (our Xorg driver
expects those to not be gem buffers and special cases a bunch of
functionality) and new userspace (which expects the handles to be
gem buffers, at least to issue GEM_CLOSE).

The third patch deals with it by making the objects returned from
dumb-buffers both (raw buffers and surfaces referenced by the same
handle), which always works and doesn't require any changes in userspace.

This fixes the known KDE (KWin's) buffer rendering issues.

v2: Fix compute_crc in the second patch, as spotted by Martin
v3: Simplify the first change which fixes the deadlock in the dma-buf
fence polling
v4: Fix mouse cursor races due to buffer mapping not being reserved in
the third patch

Zack Rusin (4):
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Add basic support for external buffers

 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c  |  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  62 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 499 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 280 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   |  40 +-
 15 files changed, 824 insertions(+), 534 deletions(-)

-- 
2.43.0



Re: [PATCH v2] drm/qxl: Pin buffer objects for internal mappings

2024-07-08 Thread Zack Rusin
On Mon, Jul 8, 2024 at 10:22 AM Thomas Zimmermann  wrote:
>
> Add qxl_bo_pin_and_vmap() that pins and vmaps a buffer object in one
> step. Update callers of the regular qxl_bo_vmap(). Fixes a bug where
> qxl accesses an unpinned buffer object while it is being moved; such
> as with the monitor-description BO. An typical error is shown below.
>
> [4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 
> 65376256x16777216+0+0
> [4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 
> 65376256x16777216+0+0
> [4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 
> 65335296x16777216+0+0
> [5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find id in 
> release_idr
>
> Commit b33651a5c98d ("drm/qxl: Do not pin buffer objects for vmap")
> removed the implicit pin operation from qxl's vmap code. This is the
> correct behavior for GEM and PRIME interfaces, but the pin is still
> needed for qxl internal operation.
>
> Also add a corresponding function qxl_bo_vunmap_and_unpin() and remove
> the old qxl_bo_vmap() helpers.
>
> Future directions: BOs should not be pinned or vmapped unnecessarily.
> The pin-and-vmap operation should be removed from the driver and a
> temporary mapping should be established with a vmap_local-like helper.
> See the client helper drm_client_buffer_vmap_local() for semantics.
>
> v2:
> - unreserve BO on errors in qxl_bo_pin_and_vmap() (Dmitry)
>
> Signed-off-by: Thomas Zimmermann 
> Fixes: b33651a5c98d ("drm/qxl: Do not pin buffer objects for vmap")
> Reported-by: David Kaplan 
> Closes: 
> https://lore.kernel.org/dri-devel/ab0fb17d-0f96-4ee6-8b21-65d02bb02...@suse.de/
> Tested-by: David Kaplan 
> Reviewed-by: Daniel Vetter 
> Cc: Thomas Zimmermann 
> Cc: Dmitry Osipenko 
> Cc: Christian König 
> Cc: Zack Rusin 
> Cc: Dave Airlie 
> Cc: Gerd Hoffmann 
> Cc: virtualizat...@lists.linux.dev
> Cc: spice-de...@lists.freedesktop.org
> ---
>  drivers/gpu/drm/qxl/qxl_display.c | 14 +++---
>  drivers/gpu/drm/qxl/qxl_object.c  | 13 +++--
>  drivers/gpu/drm/qxl/qxl_object.h  |  4 ++--
>  3 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
> b/drivers/gpu/drm/qxl/qxl_display.c
> index 86a5dea710c0..bc24af08dfcd 100644
> --- a/drivers/gpu/drm/qxl/qxl_display.c
> +++ b/drivers/gpu/drm/qxl/qxl_display.c
> @@ -584,11 +584,11 @@ static struct qxl_bo *qxl_create_cursor(struct 
> qxl_device *qdev,
> if (ret)
> goto err;
>
> -   ret = qxl_bo_vmap(cursor_bo, &cursor_map);
> +   ret = qxl_bo_pin_and_vmap(cursor_bo, &cursor_map);
> if (ret)
> goto err_unref;
>
> -   ret = qxl_bo_vmap(user_bo, &user_map);
> +   ret = qxl_bo_pin_and_vmap(user_bo, &user_map);
> if (ret)
> goto err_unmap;
>
> @@ -614,12 +614,12 @@ static struct qxl_bo *qxl_create_cursor(struct 
> qxl_device *qdev,
>user_map.vaddr, size);
> }
>
> -   qxl_bo_vunmap(user_bo);
> -   qxl_bo_vunmap(cursor_bo);
> +   qxl_bo_vunmap_and_unpin(user_bo);
> +   qxl_bo_vunmap_and_unpin(cursor_bo);
> return cursor_bo;
>
>  err_unmap:
> -   qxl_bo_vunmap(cursor_bo);
> +   qxl_bo_vunmap_and_unpin(cursor_bo);
>  err_unref:
> qxl_bo_unpin(cursor_bo);
> qxl_bo_unref(&cursor_bo);
> @@ -1205,7 +1205,7 @@ int qxl_create_monitors_object(struct qxl_device *qdev)
> }
> qdev->monitors_config_bo = gem_to_qxl_bo(gobj);
>
> -   ret = qxl_bo_vmap(qdev->monitors_config_bo, &map);
> +   ret = qxl_bo_pin_and_vmap(qdev->monitors_config_bo, &map);
> if (ret)
> return ret;
>
> @@ -1236,7 +1236,7 @@ int qxl_destroy_monitors_object(struct qxl_device *qdev)
> qdev->monitors_config = NULL;
> qdev->ram_header->monitors_config = 0;
>
> -   ret = qxl_bo_vunmap(qdev->monitors_config_bo);
> +   ret = qxl_bo_vunmap_and_unpin(qdev->monitors_config_bo);
> if (ret)
> return ret;
>
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c 
> b/drivers/gpu/drm/qxl/qxl_object.c
> index 5893e27a7ae5..66635c55cf85 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -182,7 +182,7 @@ int qxl_bo_vmap_locked(struct qxl_bo *bo, struct 
> iosys_map *map)
> return 0;
>  }
>
> -int qxl_bo_vmap(struct qxl_bo *bo, struct iosys_map *map)
> +int qxl_bo_pin_and_vmap(struct qxl_bo *bo, struct iosys_map *map)
>  {
> int r;
>
>

[PATCH v3 3/4] drm/vmwgfx: Fix handling of dumb buffers

2024-07-01 Thread Zack Rusin
Dumb buffers can be used in kms but also through prime with gallium's
resource_from_handle. In the second case the dumb buffers can be
rendered by the GPU where with the regular DRM kms interfaces they
are mapped and written to by the CPU. Because the same buffer can
be written to by the GPU and CPU vmwgfx needs to use vmw_surface (object
which properly tracks dirty state of the guest and gpu memory)
instead of vmw_bo (which is just guest side memory).

Furthermore the dumb buffer handles are expected to be gem objects by
a lot of userspace.

Make vmwgfx accept gem handles in prime and kms but internally switch
to vmw_surface's to properly track the dirty state of the objects between
the GPU and CPU.

Fixes new kwin and kde on wayland.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 12 files changed, 688 insertions(+), 502 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h 
b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
index b0d87c5f58d8..1ac3cb151b11 100644
--- a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
+++ b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
@@ -1,6 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
- * Copyright 2021 VMware, Inc.
- * SPDX-License-Identifier: GPL-2.0 OR MIT
+ *
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -31,6 +33,10 @@
 
 #include 
 
+#define SVGA3D_FLAGS_UPPER_32(svga3d_flags) ((svga3d_flags) >> 32)
+#define SVGA3D_FLAGS_LOWER_32(svga3d_flags) \
+   ((svga3d_flags) & ((uint64_t)U32_MAX))
+
 static inline u32 clamped_umul32(u32 a, u32 b)
 {
uint64_t tmp = (uint64_t) a*b;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index e5eb21a471a6..f6fafb1fc5d8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright © 2011-2023 VMware, Inc., Palo Alto, CA., USA
- * All Rights Reserved.
+ * Copyright (c) 2011-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -28,15 +28,39 @@
 
 #include "vmwgfx_bo.h"
 #include "vmwgfx_drv.h"
-
+#include "vmwgfx_resource_priv.h"
 
 #include 
 
 static void vmw_bo_release(struct vmw_bo *vbo)
 {
+   struct vmw_resource *res;
+
WARN_ON(vbo->tbo.base.funcs &&
kref_read(&vbo->tbo.base.refcount) != 0);
vmw_bo_unmap(vbo);
+
+   xa_destroy(&vbo->detached_resources);
+   WARN_ON(vbo->is_dumb && !vbo->dumb_surface);
+   if (vbo->is_dumb && vbo->dumb_surface) {
+   res = &vbo->dumb_surface->res;
+   WARN_ON(vbo != res->guest_memory_bo);
+   WARN_ON(!res->guest_memory_bo);
+   if (res->guest_memory_bo) {
+   /* Reserve and switch the backing mob. */
+   mutex_lock(&res->dev_priv->cmdbuf_mutex);
+   (void)vmw_resource_reserve(res, false, true);
+   vmw_resource_mob_detach(res);
+   if (res->coherent)
+   vmw_bo_dirty_release(res->guest_memory_bo);
+   res->guest_memory_bo = NULL;
+   res->guest_memory_offset = 0;
+   vmw_resource_unreserve(res, false, false, false, NULL,
+  0);
+   mutex_unlock(&res->dev_priv->cmdbuf_mutex);
+   }
+   vmw_surf

[PATCH v3 4/4] drm/vmwgfx: Add basic support for external buffers

2024-07-01 Thread Zack Rusin
Make vmwgfx go through the dma-buf interface to map/unmap imported
buffers. The driver used to try to directly manipulate external
buffers, assuming that everything that was coming to it had to live
in cpu accessible memory. While technically true because what's in the
vms is controlled by us, it's semantically completely broken.

Fix importing of external buffers by forwarding all memory access
requests to the importer.

Tested by the vmw_prime basic_vgem test.

Signed-off-by: Zack Rusin 
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 62 +++--
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index 07185c108218..07567d9519ec 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /*
- * Copyright 2021-2023 VMware, Inc.
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -78,6 +79,59 @@ static struct sg_table *vmw_gem_object_get_sg_table(struct 
drm_gem_object *obj)
return drm_prime_pages_to_sg(obj->dev, vmw_tt->dma_ttm.pages, 
vmw_tt->dma_ttm.num_pages);
 }
 
+static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
+   int ret;
+
+   if (obj->import_attach) {
+   ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
+   if (!ret) {
+   if (drm_WARN_ON(obj->dev, map->is_iomem)) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   return -EIO;
+   }
+   }
+   } else {
+   ret = ttm_bo_vmap(bo, map);
+   }
+
+   return ret;
+}
+
+static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   if (obj->import_attach) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   } else {
+   drm_gem_ttm_vunmap(obj, map);
+   }
+}
+
+static int vmw_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+   int ret;
+
+   if (obj->import_attach) {
+   /* Reset both vm_ops and vm_private_data, so we don't end up 
with
+* vm_ops pointing to our implementation if the dma-buf backend
+* doesn't set those fields.
+*/
+   vma->vm_private_data = NULL;
+   vma->vm_ops = NULL;
+
+   ret = dma_buf_mmap(obj->dma_buf, vma, 0);
+
+   /* Drop the reference drm_gem_mmap_obj() acquired.*/
+   if (!ret)
+   drm_gem_object_put(obj);
+
+   return ret;
+   }
+
+   return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct vm_operations_struct vmw_vm_ops = {
.pfn_mkwrite = vmw_bo_vm_mkwrite,
.page_mkwrite = vmw_bo_vm_mkwrite,
@@ -94,9 +148,9 @@ static const struct drm_gem_object_funcs 
vmw_gem_object_funcs = {
.pin = vmw_gem_object_pin,
.unpin = vmw_gem_object_unpin,
.get_sg_table = vmw_gem_object_get_sg_table,
-   .vmap = drm_gem_ttm_vmap,
-   .vunmap = drm_gem_ttm_vunmap,
-   .mmap = drm_gem_ttm_mmap,
+   .vmap = vmw_gem_vmap,
+   .vunmap = vmw_gem_vunmap,
+   .mmap = vmw_gem_mmap,
.vm_ops = &vmw_vm_ops,
 };
 
-- 
2.43.0



[PATCH v3 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-07-01 Thread Zack Rusin
Fix races issues in virtual crc generation by making sure the surface
the code uses for crc computation is properly ref counted.

Crc generation was trying to be too clever by allowing the surfaces
to go in and out of scope, with the hope of always having some kind
of screen present. That's not always the code, in particular during
atomic disable, so to make sure the surface, when present, is not
being actively destroyed at the same time, hold a reference to it.

Signed-off-by: Zack Rusin 
Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
index 3bfcf671fcd5..8651b788e98b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
@@ -75,7 +75,7 @@ vmw_surface_sync(struct vmw_private *vmw,
return ret;
 }
 
-static int
+static void
 compute_crc(struct drm_crtc *crtc,
struct vmw_surface *surf,
u32 *crc)
@@ -101,8 +101,6 @@ compute_crc(struct drm_crtc *crtc,
}
 
vmw_bo_unmap(bo);
-
-   return 0;
 }
 
 static void
@@ -116,7 +114,6 @@ crc_generate_worker(struct work_struct *work)
u64 frame_start, frame_end;
u32 crc32 = 0;
struct vmw_surface *surf = 0;
-   int ret;
 
spin_lock_irq(&du->vkms.crc_state_lock);
crc_pending = du->vkms.crc_pending;
@@ -130,22 +127,24 @@ crc_generate_worker(struct work_struct *work)
return;
 
spin_lock_irq(&du->vkms.crc_state_lock);
-   surf = du->vkms.surface;
+   surf = vmw_surface_reference(du->vkms.surface);
spin_unlock_irq(&du->vkms.crc_state_lock);
 
-   if (vmw_surface_sync(vmw, surf)) {
-   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
surface!\n");
-   return;
-   }
+   if (surf) {
+   if (vmw_surface_sync(vmw, surf)) {
+   drm_warn(
+   crtc->dev,
+   "CRC worker wasn't able to sync the crc 
surface!\n");
+   return;
+   }
 
-   ret = compute_crc(crtc, surf, &crc32);
-   if (ret)
-   return;
+   compute_crc(crtc, surf, &crc32);
+   vmw_surface_unreference(&surf);
+   }
 
spin_lock_irq(&du->vkms.crc_state_lock);
frame_start = du->vkms.frame_start;
frame_end = du->vkms.frame_end;
-   crc_pending = du->vkms.crc_pending;
du->vkms.frame_start = 0;
du->vkms.frame_end = 0;
du->vkms.crc_pending = false;
@@ -164,7 +163,7 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
struct vmw_display_unit *du = container_of(timer, struct 
vmw_display_unit, vkms.timer);
struct drm_crtc *crtc = &du->crtc;
struct vmw_private *vmw = vmw_priv(crtc->dev);
-   struct vmw_surface *surf = NULL;
+   bool has_surface = false;
u64 ret_overrun;
bool locked, ret;
 
@@ -179,10 +178,10 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
WARN_ON(!ret);
if (!locked)
return HRTIMER_RESTART;
-   surf = du->vkms.surface;
+   has_surface = du->vkms.surface != NULL;
vmw_vkms_unlock(crtc);
 
-   if (du->vkms.crc_enabled && surf) {
+   if (du->vkms.crc_enabled && has_surface) {
u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
spin_lock(&du->vkms.crc_state_lock);
@@ -336,6 +335,8 @@ vmw_vkms_crtc_cleanup(struct drm_crtc *crtc)
 {
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
 
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
WARN_ON(work_pending(&du->vkms.crc_generator_work));
hrtimer_cancel(&du->vkms.timer);
 }
@@ -497,9 +498,12 @@ vmw_vkms_set_crc_surface(struct drm_crtc *crtc,
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
struct vmw_private *vmw = vmw_priv(crtc->dev);
 
-   if (vmw->vkms_enabled) {
+   if (vmw->vkms_enabled && du->vkms.surface != surf) {
WARN_ON(atomic_read(&du->vkms.atomic_lock) != 
VMW_VKMS_LOCK_MODESET);
-   du->vkms.surface = surf;
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
+   if (surf)
+   du->vkms.surface = vmw_surface_reference(surf);
}
 }
 
-- 
2.43.0



[PATCH v3 1/4] drm/vmwgfx: Fix a deadlock in dma buf fence polling

2024-07-01 Thread Zack Rusin
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.

Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.

Signed-off-by: Zack Rusin 
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 5efc6a766f64..588d50ababf6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
 #define VMW_FENCE_WRAP (1 << 31)
 
 struct vmw_fence_manager {
-   int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -124,13 +123,13 @@ static void vmw_fence_obj_destroy(struct dma_fence *f)
 {
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
-
struct vmw_fence_manager *fman = fman_from_fence(fence);
 
-   spin_lock(&fman->lock);
-   list_del_init(&fence->head);
-   --fman->num_fence_objects;
-   spin_unlock(&fman->lock);
+   if (!list_empty(&fence->head)) {
+   spin_lock(&fman->lock);
+   list_del_init(&fence->head);
+   spin_unlock(&fman->lock);
+   }
fence->destroy(fence);
 }
 
@@ -257,7 +256,6 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
 };
 
-
 /*
  * Execute signal actions on fences recently signaled.
  * This is done from a workqueue so we don't have to execute
@@ -355,7 +353,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
-   ++fman->num_fence_objects;
 
 out_unlock:
spin_unlock(&fman->lock);
@@ -403,7 +400,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
  u32 passed_seqno)
 {
u32 goal_seqno;
-   struct vmw_fence_obj *fence;
+   struct vmw_fence_obj *fence, *next_fence;
 
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +410,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
return false;
 
fman->seqno_valid = false;
-   list_for_each_entry(fence, &fman->fence_list, head) {
+   list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_fence_goal_write(fman->dev_priv,
-- 
2.43.0



[PATCH v3 0/4] Fix various buffer mapping/import issues

2024-07-01 Thread Zack Rusin
This small series fixes all known prime/dumb_buffer/buffer dirty
tracking issues. Fixing of dumb-buffers turned out to be a lot more
complex than I wanted it to be. There's not much that can be done
there because the driver has to support old userspace (our Xorg driver
expects those to not be gem buffers and special cases a bunch of
functionality) and new userspace (which expects the handles to be
gem buffers, at least to issue GEM_CLOSE).

The third patch deals with it by making the objects returned from
dumb-buffers both (raw buffers and surfaces referenced by the same
handle), which always works and doesn't require any changes in userspace.

This fixes the known KDE (KWin's) buffer rendering issues.

v2: Fix compute_crc in the second patch, as spotted by Martin
v3: Simplify the first change which fixes the deadlock in the dma-buf
fence polling

Zack Rusin (4):
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Add basic support for external buffers

 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c  |  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  62 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   |  40 +-
 15 files changed, 775 insertions(+), 534 deletions(-)

-- 
2.43.0



[PATCH v2 1/4] drm/vmwgfx: Fix a deadlock in dma buf fence polling

2024-06-28 Thread Zack Rusin
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.

Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.

Signed-off-by: Zack Rusin 
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 5efc6a766f64..76971ef7801a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
 #define VMW_FENCE_WRAP (1 << 31)
 
 struct vmw_fence_manager {
-   int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -120,16 +119,23 @@ static void vmw_fence_goal_write(struct vmw_private *vmw, 
u32 value)
  * objects with actions attached to them.
  */
 
-static void vmw_fence_obj_destroy(struct dma_fence *f)
+static void vmw_fence_obj_destroy_removed(struct dma_fence *f)
 {
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
 
+   WARN_ON(!list_empty(&fence->head));
+   fence->destroy(fence);
+}
+
+static void vmw_fence_obj_destroy(struct dma_fence *f)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
struct vmw_fence_manager *fman = fman_from_fence(fence);
 
spin_lock(&fman->lock);
list_del_init(&fence->head);
-   --fman->num_fence_objects;
spin_unlock(&fman->lock);
fence->destroy(fence);
 }
@@ -257,6 +263,13 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
 };
 
+static const struct dma_fence_ops vmw_fence_ops_removed = {
+   .get_driver_name = vmw_fence_get_driver_name,
+   .get_timeline_name = vmw_fence_get_timeline_name,
+   .enable_signaling = vmw_fence_enable_signaling,
+   .wait = vmw_fence_wait,
+   .release = vmw_fence_obj_destroy_removed,
+};
 
 /*
  * Execute signal actions on fences recently signaled.
@@ -355,7 +368,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
-   ++fman->num_fence_objects;
 
 out_unlock:
spin_unlock(&fman->lock);
@@ -403,7 +415,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
  u32 passed_seqno)
 {
u32 goal_seqno;
-   struct vmw_fence_obj *fence;
+   struct vmw_fence_obj *fence, *next_fence;
 
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +425,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
return false;
 
fman->seqno_valid = false;
-   list_for_each_entry(fence, &fman->fence_list, head) {
+   list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_fence_goal_write(fman->dev_priv,
@@ -471,6 +483,7 @@ static void __vmw_fences_update(struct vmw_fence_manager 
*fman)
 rerun:
list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
+   fence->base.ops = &vmw_fence_ops_removed;
list_del_init(&fence->head);
dma_fence_signal_locked(&fence->base);
INIT_LIST_HEAD(&action_list);
@@ -662,6 +675,7 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 VMW_FENCE_WAIT_TIMEOUT);
 
if (unlikely(ret != 0)) {
+   fence->base.ops = &vmw_fence_ops_removed;
list_del_init(&fence->head);
dma_fence_signal(&fence->base);
INIT_LIST_HEAD(&action_list);
-- 
2.40.1



[PATCH v2 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-06-28 Thread Zack Rusin
Fix races issues in virtual crc generation by making sure the surface
the code uses for crc computation is properly ref counted.

Crc generation was trying to be too clever by allowing the surfaces
to go in and out of scope, with the hope of always having some kind
of screen present. That's not always the code, in particular during
atomic disable, so to make sure the surface, when present, is not
being actively destroyed at the same time, hold a reference to it.

Signed-off-by: Zack Rusin 
Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
Cc: Zack Rusin 
Cc: Martin Krastev 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
index 3bfcf671fcd5..8651b788e98b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
@@ -75,7 +75,7 @@ vmw_surface_sync(struct vmw_private *vmw,
return ret;
 }
 
-static int
+static void
 compute_crc(struct drm_crtc *crtc,
struct vmw_surface *surf,
u32 *crc)
@@ -101,8 +101,6 @@ compute_crc(struct drm_crtc *crtc,
}
 
vmw_bo_unmap(bo);
-
-   return 0;
 }
 
 static void
@@ -116,7 +114,6 @@ crc_generate_worker(struct work_struct *work)
u64 frame_start, frame_end;
u32 crc32 = 0;
struct vmw_surface *surf = 0;
-   int ret;
 
spin_lock_irq(&du->vkms.crc_state_lock);
crc_pending = du->vkms.crc_pending;
@@ -130,22 +127,24 @@ crc_generate_worker(struct work_struct *work)
return;
 
spin_lock_irq(&du->vkms.crc_state_lock);
-   surf = du->vkms.surface;
+   surf = vmw_surface_reference(du->vkms.surface);
spin_unlock_irq(&du->vkms.crc_state_lock);
 
-   if (vmw_surface_sync(vmw, surf)) {
-   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
surface!\n");
-   return;
-   }
+   if (surf) {
+   if (vmw_surface_sync(vmw, surf)) {
+   drm_warn(
+   crtc->dev,
+   "CRC worker wasn't able to sync the crc 
surface!\n");
+   return;
+   }
 
-   ret = compute_crc(crtc, surf, &crc32);
-   if (ret)
-   return;
+   compute_crc(crtc, surf, &crc32);
+   vmw_surface_unreference(&surf);
+   }
 
spin_lock_irq(&du->vkms.crc_state_lock);
frame_start = du->vkms.frame_start;
frame_end = du->vkms.frame_end;
-   crc_pending = du->vkms.crc_pending;
du->vkms.frame_start = 0;
du->vkms.frame_end = 0;
du->vkms.crc_pending = false;
@@ -164,7 +163,7 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
struct vmw_display_unit *du = container_of(timer, struct 
vmw_display_unit, vkms.timer);
struct drm_crtc *crtc = &du->crtc;
struct vmw_private *vmw = vmw_priv(crtc->dev);
-   struct vmw_surface *surf = NULL;
+   bool has_surface = false;
u64 ret_overrun;
bool locked, ret;
 
@@ -179,10 +178,10 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
WARN_ON(!ret);
if (!locked)
return HRTIMER_RESTART;
-   surf = du->vkms.surface;
+   has_surface = du->vkms.surface != NULL;
vmw_vkms_unlock(crtc);
 
-   if (du->vkms.crc_enabled && surf) {
+   if (du->vkms.crc_enabled && has_surface) {
u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
spin_lock(&du->vkms.crc_state_lock);
@@ -336,6 +335,8 @@ vmw_vkms_crtc_cleanup(struct drm_crtc *crtc)
 {
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
 
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
WARN_ON(work_pending(&du->vkms.crc_generator_work));
hrtimer_cancel(&du->vkms.timer);
 }
@@ -497,9 +498,12 @@ vmw_vkms_set_crc_surface(struct drm_crtc *crtc,
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
struct vmw_private *vmw = vmw_priv(crtc->dev);
 
-   if (vmw->vkms_enabled) {
+   if (vmw->vkms_enabled && du->vkms.surface != surf) {
WARN_ON(atomic_read(&du->vkms.atomic_lock) != 
VMW_VKMS_LOCK_MODESET);
-   du->vkms.surface = surf;
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
+   if (surf)
+   du->vkms.surface = vmw_surface_reference(surf);
}
 }
 
-- 
2.40.1



[PATCH v2 3/4] drm/vmwgfx: Fix handling of dumb buffers

2024-06-28 Thread Zack Rusin
Dumb buffers can be used in kms but also through prime with gallium's
resource_from_handle. In the second case the dumb buffers can be
rendered by the GPU where with the regular DRM kms interfaces they
are mapped and written to by the CPU. Because the same buffer can
be written to by the GPU and CPU vmwgfx needs to use vmw_surface (object
which properly tracks dirty state of the guest and gpu memory)
instead of vmw_bo (which is just guest side memory).

Furthermore the dumb buffer handles are expected to be gem objects by
a lot of userspace.

Make vmwgfx accept gem handles in prime and kms but internally switch
to vmw_surface's to properly track the dirty state of the objects between
the GPU and CPU.

Fixes new kwin and kde on wayland.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 12 files changed, 688 insertions(+), 502 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h 
b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
index b0d87c5f58d8..1ac3cb151b11 100644
--- a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
+++ b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
@@ -1,6 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
- * Copyright 2021 VMware, Inc.
- * SPDX-License-Identifier: GPL-2.0 OR MIT
+ *
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -31,6 +33,10 @@
 
 #include 
 
+#define SVGA3D_FLAGS_UPPER_32(svga3d_flags) ((svga3d_flags) >> 32)
+#define SVGA3D_FLAGS_LOWER_32(svga3d_flags) \
+   ((svga3d_flags) & ((uint64_t)U32_MAX))
+
 static inline u32 clamped_umul32(u32 a, u32 b)
 {
uint64_t tmp = (uint64_t) a*b;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index e5eb21a471a6..f6fafb1fc5d8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright © 2011-2023 VMware, Inc., Palo Alto, CA., USA
- * All Rights Reserved.
+ * Copyright (c) 2011-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -28,15 +28,39 @@
 
 #include "vmwgfx_bo.h"
 #include "vmwgfx_drv.h"
-
+#include "vmwgfx_resource_priv.h"
 
 #include 
 
 static void vmw_bo_release(struct vmw_bo *vbo)
 {
+   struct vmw_resource *res;
+
WARN_ON(vbo->tbo.base.funcs &&
kref_read(&vbo->tbo.base.refcount) != 0);
vmw_bo_unmap(vbo);
+
+   xa_destroy(&vbo->detached_resources);
+   WARN_ON(vbo->is_dumb && !vbo->dumb_surface);
+   if (vbo->is_dumb && vbo->dumb_surface) {
+   res = &vbo->dumb_surface->res;
+   WARN_ON(vbo != res->guest_memory_bo);
+   WARN_ON(!res->guest_memory_bo);
+   if (res->guest_memory_bo) {
+   /* Reserve and switch the backing mob. */
+   mutex_lock(&res->dev_priv->cmdbuf_mutex);
+   (void)vmw_resource_reserve(res, false, true);
+   vmw_resource_mob_detach(res);
+   if (res->coherent)
+   vmw_bo_dirty_release(res->guest_memory_bo);
+   res->guest_memory_bo = NULL;
+   res->guest_memory_offset = 0;
+   vmw_resource_unreserve(res, false, false, false, NULL,
+  0);
+   mutex_unlock(&res->dev_priv->cmdbuf_mutex);
+   }
+   vmw_surface_unreference(&vbo->dumb_surface);
+   }

[PATCH v2 4/4] drm/vmwgfx: Add basic support for external buffers

2024-06-28 Thread Zack Rusin
Make vmwgfx go through the dma-buf interface to map/unmap imported
buffers. The driver used to try to directly manipulate external
buffers, assuming that everything that was coming to it had to live
in cpu accessible memory. While technically true because what's in the
vms is controlled by us, it's semantically completely broken.

Fix importing of external buffers by forwarding all memory access
requests to the importer.

Tested by the vmw_prime basic_vgem test.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 62 +++--
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index 07185c108218..07567d9519ec 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /*
- * Copyright 2021-2023 VMware, Inc.
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -78,6 +79,59 @@ static struct sg_table *vmw_gem_object_get_sg_table(struct 
drm_gem_object *obj)
return drm_prime_pages_to_sg(obj->dev, vmw_tt->dma_ttm.pages, 
vmw_tt->dma_ttm.num_pages);
 }
 
+static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
+   int ret;
+
+   if (obj->import_attach) {
+   ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
+   if (!ret) {
+   if (drm_WARN_ON(obj->dev, map->is_iomem)) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   return -EIO;
+   }
+   }
+   } else {
+   ret = ttm_bo_vmap(bo, map);
+   }
+
+   return ret;
+}
+
+static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   if (obj->import_attach) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   } else {
+   drm_gem_ttm_vunmap(obj, map);
+   }
+}
+
+static int vmw_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+   int ret;
+
+   if (obj->import_attach) {
+   /* Reset both vm_ops and vm_private_data, so we don't end up 
with
+* vm_ops pointing to our implementation if the dma-buf backend
+* doesn't set those fields.
+*/
+   vma->vm_private_data = NULL;
+   vma->vm_ops = NULL;
+
+   ret = dma_buf_mmap(obj->dma_buf, vma, 0);
+
+   /* Drop the reference drm_gem_mmap_obj() acquired.*/
+   if (!ret)
+   drm_gem_object_put(obj);
+
+   return ret;
+   }
+
+   return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct vm_operations_struct vmw_vm_ops = {
.pfn_mkwrite = vmw_bo_vm_mkwrite,
.page_mkwrite = vmw_bo_vm_mkwrite,
@@ -94,9 +148,9 @@ static const struct drm_gem_object_funcs 
vmw_gem_object_funcs = {
.pin = vmw_gem_object_pin,
.unpin = vmw_gem_object_unpin,
.get_sg_table = vmw_gem_object_get_sg_table,
-   .vmap = drm_gem_ttm_vmap,
-   .vunmap = drm_gem_ttm_vunmap,
-   .mmap = drm_gem_ttm_mmap,
+   .vmap = vmw_gem_vmap,
+   .vunmap = vmw_gem_vunmap,
+   .mmap = vmw_gem_mmap,
.vm_ops = &vmw_vm_ops,
 };
 
-- 
2.40.1



[PATCH v2 0/4] Fix various buffer mapping/import issues

2024-06-28 Thread Zack Rusin
This small series fixes all known prime/dumb_buffer/buffer dirty
tracking issues. Fixing of dumb-buffers turned out to be a lot more
complex than I wanted it to be. There's not much that can be done
there because the driver has to support old userspace (our Xorg driver
expects those to not be gem buffers and special cases a bunch of
functionality) and new userspace (which expects the handles to be
gem buffers, at least to issue GEM_CLOSE).

The third patch deals with it by making the objects returned from
dumb-buffers both (raw buffers and surfaces referenced by the same
handle), which always works and doesn't require any changes in userspace.

This fixes the known KDE (KWin's) buffer rendering issues.

v2: Fix compute_crc in the second patch, as spotted by Martin

Zack Rusin (4):
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Add basic support for external buffers

 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c  |  26 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  62 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   |  40 +-
 15 files changed, 788 insertions(+), 530 deletions(-)

-- 
2.40.1



Re: [PATCH 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-06-27 Thread Zack Rusin
On Thu, Jun 27, 2024 at 8:37 AM Martin Krastev
 wrote:
>
> On Thu, Jun 27, 2024 at 8:34 AM Zack Rusin  wrote:
> >
> > Fix races issues in virtual crc generation by making sure the surface
> > the code uses for crc computation is properly ref counted.
> >
> > Crc generation was trying to be too clever by allowing the surfaces
> > to go in and out of scope, with the hope of always having some kind
> > of screen present. That's not always the code, in particular during
> > atomic disable, so to make sure the surface, when present, is not
> > being actively destroyed at the same time, hold a reference to it.
> >
> > Signed-off-by: Zack Rusin 
> > Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
> > Cc: Zack Rusin 
> > Cc: Martin Krastev 
> > Cc: Broadcom internal kernel review list 
> > 
> > Cc: dri-devel@lists.freedesktop.org
> > ---
> >  drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 37 +---
> >  1 file changed, 23 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
> > b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> > index 3bfcf671fcd5..c35f7df99977 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> > @@ -130,22 +130,26 @@ crc_generate_worker(struct work_struct *work)
> > return;
> >
> > spin_lock_irq(&du->vkms.crc_state_lock);
> > -   surf = du->vkms.surface;
> > +   surf = vmw_surface_reference(du->vkms.surface);
> > spin_unlock_irq(&du->vkms.crc_state_lock);
> >
> > -   if (vmw_surface_sync(vmw, surf)) {
> > -   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
> > surface!\n");
> > -   return;
> > +   if (surf) {
> > +   if (vmw_surface_sync(vmw, surf)) {
> > +   drm_warn(
> > +   crtc->dev,
> > +   "CRC worker wasn't able to sync the crc 
> > surface!\n");
> > +   return;
> > +   }
> > +
> > +   ret = compute_crc(crtc, surf, &crc32);
> > +   if (ret)
> > +   return;
> > +   vmw_surface_unreference(&surf);
>
> So compute_crc effectively never errs here, so the
> vmw_surface_unreference is a given, but
> wouldn't it correct to have the vmw_surface_unreference precede the
> error-check early-out?

Yes, good catch on both counts. I'll just make compute_crc return void
in v2 instead of unconditionally returning 0, this way we won't have
to deal with multiple unref paths.
z


[PATCH 4/4] drm/vmwgfx: Add basic support for external buffers

2024-06-26 Thread Zack Rusin
Make vmwgfx go through the dma-buf interface to map/unmap imported
buffers. The driver used to try to directly manipulate external
buffers, assuming that everything that was coming to it had to live
in cpu accessible memory. While technically true because what's in the
vms is controlled by us, it's semantically completely broken.

Fix importing of external buffers by forwarding all memory access
requests to the importer.

Tested by the vmw_prime basic_vgem test.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 62 +++--
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
index 07185c108218..07567d9519ec 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /*
- * Copyright 2021-2023 VMware, Inc.
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -78,6 +79,59 @@ static struct sg_table *vmw_gem_object_get_sg_table(struct 
drm_gem_object *obj)
return drm_prime_pages_to_sg(obj->dev, vmw_tt->dma_ttm.pages, 
vmw_tt->dma_ttm.num_pages);
 }
 
+static int vmw_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
+   int ret;
+
+   if (obj->import_attach) {
+   ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
+   if (!ret) {
+   if (drm_WARN_ON(obj->dev, map->is_iomem)) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   return -EIO;
+   }
+   }
+   } else {
+   ret = ttm_bo_vmap(bo, map);
+   }
+
+   return ret;
+}
+
+static void vmw_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   if (obj->import_attach) {
+   dma_buf_vunmap(obj->import_attach->dmabuf, map);
+   } else {
+   drm_gem_ttm_vunmap(obj, map);
+   }
+}
+
+static int vmw_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+   int ret;
+
+   if (obj->import_attach) {
+   /* Reset both vm_ops and vm_private_data, so we don't end up 
with
+* vm_ops pointing to our implementation if the dma-buf backend
+* doesn't set those fields.
+*/
+   vma->vm_private_data = NULL;
+   vma->vm_ops = NULL;
+
+   ret = dma_buf_mmap(obj->dma_buf, vma, 0);
+
+   /* Drop the reference drm_gem_mmap_obj() acquired.*/
+   if (!ret)
+   drm_gem_object_put(obj);
+
+   return ret;
+   }
+
+   return drm_gem_ttm_mmap(obj, vma);
+}
+
 static const struct vm_operations_struct vmw_vm_ops = {
.pfn_mkwrite = vmw_bo_vm_mkwrite,
.page_mkwrite = vmw_bo_vm_mkwrite,
@@ -94,9 +148,9 @@ static const struct drm_gem_object_funcs 
vmw_gem_object_funcs = {
.pin = vmw_gem_object_pin,
.unpin = vmw_gem_object_unpin,
.get_sg_table = vmw_gem_object_get_sg_table,
-   .vmap = drm_gem_ttm_vmap,
-   .vunmap = drm_gem_ttm_vunmap,
-   .mmap = drm_gem_ttm_mmap,
+   .vmap = vmw_gem_vmap,
+   .vunmap = vmw_gem_vunmap,
+   .mmap = vmw_gem_mmap,
.vm_ops = &vmw_vm_ops,
 };
 
-- 
2.40.1



[PATCH 2/4] drm/vmwgfx: Make sure the screen surface is ref counted

2024-06-26 Thread Zack Rusin
Fix races issues in virtual crc generation by making sure the surface
the code uses for crc computation is properly ref counted.

Crc generation was trying to be too clever by allowing the surfaces
to go in and out of scope, with the hope of always having some kind
of screen present. That's not always the code, in particular during
atomic disable, so to make sure the surface, when present, is not
being actively destroyed at the same time, hold a reference to it.

Signed-off-by: Zack Rusin 
Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
Cc: Zack Rusin 
Cc: Martin Krastev 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 37 +---
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
index 3bfcf671fcd5..c35f7df99977 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
@@ -130,22 +130,26 @@ crc_generate_worker(struct work_struct *work)
return;
 
spin_lock_irq(&du->vkms.crc_state_lock);
-   surf = du->vkms.surface;
+   surf = vmw_surface_reference(du->vkms.surface);
spin_unlock_irq(&du->vkms.crc_state_lock);
 
-   if (vmw_surface_sync(vmw, surf)) {
-   drm_warn(crtc->dev, "CRC worker wasn't able to sync the crc 
surface!\n");
-   return;
+   if (surf) {
+   if (vmw_surface_sync(vmw, surf)) {
+   drm_warn(
+   crtc->dev,
+   "CRC worker wasn't able to sync the crc 
surface!\n");
+   return;
+   }
+
+   ret = compute_crc(crtc, surf, &crc32);
+   if (ret)
+   return;
+   vmw_surface_unreference(&surf);
}
 
-   ret = compute_crc(crtc, surf, &crc32);
-   if (ret)
-   return;
-
spin_lock_irq(&du->vkms.crc_state_lock);
frame_start = du->vkms.frame_start;
frame_end = du->vkms.frame_end;
-   crc_pending = du->vkms.crc_pending;
du->vkms.frame_start = 0;
du->vkms.frame_end = 0;
du->vkms.crc_pending = false;
@@ -164,7 +168,7 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
struct vmw_display_unit *du = container_of(timer, struct 
vmw_display_unit, vkms.timer);
struct drm_crtc *crtc = &du->crtc;
struct vmw_private *vmw = vmw_priv(crtc->dev);
-   struct vmw_surface *surf = NULL;
+   bool has_surface = false;
u64 ret_overrun;
bool locked, ret;
 
@@ -179,10 +183,10 @@ vmw_vkms_vblank_simulate(struct hrtimer *timer)
WARN_ON(!ret);
if (!locked)
return HRTIMER_RESTART;
-   surf = du->vkms.surface;
+   has_surface = du->vkms.surface != NULL;
vmw_vkms_unlock(crtc);
 
-   if (du->vkms.crc_enabled && surf) {
+   if (du->vkms.crc_enabled && has_surface) {
u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
spin_lock(&du->vkms.crc_state_lock);
@@ -336,6 +340,8 @@ vmw_vkms_crtc_cleanup(struct drm_crtc *crtc)
 {
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
 
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
WARN_ON(work_pending(&du->vkms.crc_generator_work));
hrtimer_cancel(&du->vkms.timer);
 }
@@ -497,9 +503,12 @@ vmw_vkms_set_crc_surface(struct drm_crtc *crtc,
struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
struct vmw_private *vmw = vmw_priv(crtc->dev);
 
-   if (vmw->vkms_enabled) {
+   if (vmw->vkms_enabled && du->vkms.surface != surf) {
WARN_ON(atomic_read(&du->vkms.atomic_lock) != 
VMW_VKMS_LOCK_MODESET);
-   du->vkms.surface = surf;
+   if (du->vkms.surface)
+   vmw_surface_unreference(&du->vkms.surface);
+   if (surf)
+   du->vkms.surface = vmw_surface_reference(surf);
}
 }
 
-- 
2.40.1



[PATCH 3/4] drm/vmwgfx: Fix handling of dumb buffers

2024-06-26 Thread Zack Rusin
Dumb buffers can be used in kms but also through prime with gallium's
resource_from_handle. In the second case the dumb buffers can be
rendered by the GPU where with the regular DRM kms interfaces they
are mapped and written to by the CPU. Because the same buffer can
be written to by the GPU and CPU vmwgfx needs to use vmw_surface (object
which properly tracks dirty state of the guest and gpu memory)
instead of vmw_bo (which is just guest side memory).

Furthermore the dumb buffer handles are expected to be gem objects by
a lot of userspace.

Make vmwgfx accept gem handles in prime and kms but internally switch
to vmw_surface's to properly track the dirty state of the objects between
the GPU and CPU.

Fixes new kwin and kde on wayland.

Signed-off-by: Zack Rusin 
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.9+
---
 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 12 files changed, 688 insertions(+), 502 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h 
b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
index b0d87c5f58d8..1ac3cb151b11 100644
--- a/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
+++ b/drivers/gpu/drm/vmwgfx/vmw_surface_cache.h
@@ -1,6 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
- * Copyright 2021 VMware, Inc.
- * SPDX-License-Identifier: GPL-2.0 OR MIT
+ *
+ * Copyright (c) 2021-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person
  * obtaining a copy of this software and associated documentation
@@ -31,6 +33,10 @@
 
 #include 
 
+#define SVGA3D_FLAGS_UPPER_32(svga3d_flags) ((svga3d_flags) >> 32)
+#define SVGA3D_FLAGS_LOWER_32(svga3d_flags) \
+   ((svga3d_flags) & ((uint64_t)U32_MAX))
+
 static inline u32 clamped_umul32(u32 a, u32 b)
 {
uint64_t tmp = (uint64_t) a*b;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index e5eb21a471a6..f6fafb1fc5d8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright © 2011-2023 VMware, Inc., Palo Alto, CA., USA
- * All Rights Reserved.
+ * Copyright (c) 2011-2024 Broadcom. All Rights Reserved. The term
+ * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -28,15 +28,39 @@
 
 #include "vmwgfx_bo.h"
 #include "vmwgfx_drv.h"
-
+#include "vmwgfx_resource_priv.h"
 
 #include 
 
 static void vmw_bo_release(struct vmw_bo *vbo)
 {
+   struct vmw_resource *res;
+
WARN_ON(vbo->tbo.base.funcs &&
kref_read(&vbo->tbo.base.refcount) != 0);
vmw_bo_unmap(vbo);
+
+   xa_destroy(&vbo->detached_resources);
+   WARN_ON(vbo->is_dumb && !vbo->dumb_surface);
+   if (vbo->is_dumb && vbo->dumb_surface) {
+   res = &vbo->dumb_surface->res;
+   WARN_ON(vbo != res->guest_memory_bo);
+   WARN_ON(!res->guest_memory_bo);
+   if (res->guest_memory_bo) {
+   /* Reserve and switch the backing mob. */
+   mutex_lock(&res->dev_priv->cmdbuf_mutex);
+   (void)vmw_resource_reserve(res, false, true);
+   vmw_resource_mob_detach(res);
+   if (res->coherent)
+   vmw_bo_dirty_release(res->guest_memory_bo);
+   res->guest_memory_bo = NULL;
+   res->guest_memory_offset = 0;
+   vmw_resource_unreserve(res, false, false, false, NULL,
+  0);
+   mutex_unlock(&res->dev_priv->cmdbuf_mutex);
+   }
+   vmw_surface_unreference(&vbo->dumb_surface);
+   }

[PATCH 1/4] drm/vmwgfx: Fix a deadlock in dma buf fence polling

2024-06-26 Thread Zack Rusin
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.

vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.

dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.

Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.

Signed-off-by: Zack Rusin 
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 5efc6a766f64..76971ef7801a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
 #define VMW_FENCE_WRAP (1 << 31)
 
 struct vmw_fence_manager {
-   int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -120,16 +119,23 @@ static void vmw_fence_goal_write(struct vmw_private *vmw, 
u32 value)
  * objects with actions attached to them.
  */
 
-static void vmw_fence_obj_destroy(struct dma_fence *f)
+static void vmw_fence_obj_destroy_removed(struct dma_fence *f)
 {
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
 
+   WARN_ON(!list_empty(&fence->head));
+   fence->destroy(fence);
+}
+
+static void vmw_fence_obj_destroy(struct dma_fence *f)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
struct vmw_fence_manager *fman = fman_from_fence(fence);
 
spin_lock(&fman->lock);
list_del_init(&fence->head);
-   --fman->num_fence_objects;
spin_unlock(&fman->lock);
fence->destroy(fence);
 }
@@ -257,6 +263,13 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
 };
 
+static const struct dma_fence_ops vmw_fence_ops_removed = {
+   .get_driver_name = vmw_fence_get_driver_name,
+   .get_timeline_name = vmw_fence_get_timeline_name,
+   .enable_signaling = vmw_fence_enable_signaling,
+   .wait = vmw_fence_wait,
+   .release = vmw_fence_obj_destroy_removed,
+};
 
 /*
  * Execute signal actions on fences recently signaled.
@@ -355,7 +368,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
-   ++fman->num_fence_objects;
 
 out_unlock:
spin_unlock(&fman->lock);
@@ -403,7 +415,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
  u32 passed_seqno)
 {
u32 goal_seqno;
-   struct vmw_fence_obj *fence;
+   struct vmw_fence_obj *fence, *next_fence;
 
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +425,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
return false;
 
fman->seqno_valid = false;
-   list_for_each_entry(fence, &fman->fence_list, head) {
+   list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_fence_goal_write(fman->dev_priv,
@@ -471,6 +483,7 @@ static void __vmw_fences_update(struct vmw_fence_manager 
*fman)
 rerun:
list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
+   fence->base.ops = &vmw_fence_ops_removed;
list_del_init(&fence->head);
dma_fence_signal_locked(&fence->base);
INIT_LIST_HEAD(&action_list);
@@ -662,6 +675,7 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 VMW_FENCE_WAIT_TIMEOUT);
 
if (unlikely(ret != 0)) {
+   fence->base.ops = &vmw_fence_ops_removed;
list_del_init(&fence->head);
dma_fence_signal(&fence->base);
INIT_LIST_HEAD(&action_list);
-- 
2.40.1



[PATCH 0/4] Fix various buffer mapping/import issues

2024-06-26 Thread Zack Rusin
This small series fixes all known prime/dumb_buffer/buffer dirty
tracking issues. Fixing of dumb-buffers turned out to be a lot more
complex than I wanted it to be. There's not much that can be done
there because the driver has to support old userspace (our Xorg driver
expects those to not be gem buffers and special cases a bunch of
functionality) and new userspace (which expects the handles to be
gem buffers, at least to issue GEM_CLOSE).
   
The third patch deals with it by making the objects returned from
dumb-buffers both (raw buffers and surfaces referenced by the same
handle), which always works and doesn't require any changes in userspace.
   
This fixes the known KDE (KWin's) buffer rendering issues.

Zack Rusin (4):
  drm/vmwgfx: Fix a deadlock in dma buf fence polling
  drm/vmwgfx: Make sure the screen surface is ref counted
  drm/vmwgfx: Fix handling of dumb buffers
  drm/vmwgfx: Add basic support for external buffers

 drivers/gpu/drm/vmwgfx/vmw_surface_cache.h |  10 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 127 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  40 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c  |  26 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  62 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c| 453 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  14 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  33 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   | 145 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 277 -
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   |  37 +-
 15 files changed, 789 insertions(+), 526 deletions(-)

-- 
2.40.1



Re: [PATCH 1/2] drm/vmwgfx: Fix missing HYPERVISOR_GUEST dependency

2024-06-17 Thread Zack Rusin
On Mon, Jun 17, 2024 at 6:02 AM Borislav Petkov  wrote:
>
> On Mon, Jun 17, 2024 at 11:07:09AM +0200, Borislav Petkov wrote:
> > On Sat, Jun 15, 2024 at 06:25:10PM -0700, Alexey Makhalov wrote:
> > > VMWARE_HYPERCALL alternative will not work as intended without
> > > VMware guest code initialization.
> > >
> > > Reported-by: kernel test robot 
> > > Closes: 
> > > https://lore.kernel.org/oe-kbuild-all/202406152104.fxakp1mb-...@intel.com/
> > > Signed-off-by: Alexey Makhalov 
> > > ---
> > >  drivers/gpu/drm/vmwgfx/Kconfig | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/vmwgfx/Kconfig 
> > > b/drivers/gpu/drm/vmwgfx/Kconfig
> > > index faddae3d6ac2..6f1ac940cbae 100644
> > > --- a/drivers/gpu/drm/vmwgfx/Kconfig
> > > +++ b/drivers/gpu/drm/vmwgfx/Kconfig
> > > @@ -2,7 +2,7 @@
> > >  config DRM_VMWGFX
> > > tristate "DRM driver for VMware Virtual GPU"
> > > depends on DRM && PCI && MMU
> > > -   depends on X86 || ARM64
> > > +   depends on (X86 && HYPERVISOR_GUEST) || ARM64
> > > select DRM_TTM
> > > select DRM_TTM_HELPER
> > > select MAPPING_DIRTY_HELPERS
> > > --
> >
> > Right, I'll queue this soon but it doesn't reproduce here with gcc-11 or 
> > gcc-13.
> > This must be something gcc-9 specific or so...
>
> Actually, that's a DRM patch.
>
> Folks in To: ok to carry this though the tip tree?

That's fine with me. Thanks.

z


Re: [PATCH 3/6] drm/vmwgfx: remove unused struct 'vmw_stdu_dma'

2024-06-05 Thread Zack Rusin
On Mon, Jun 3, 2024 at 1:24 PM Dr. David Alan Gilbert  wrote:
>
> * li...@treblig.org (li...@treblig.org) wrote:
> > From: "Dr. David Alan Gilbert" 
> >
> > 'vmw_stdu_dma' is unused since
> > commit 39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
> > Remove it.
>
> Ping.

Thanks. I pushed it to drm-misc-fixes.

z


Re: [PATCH v3 0/4] Fix memory limits for STDU

2024-05-21 Thread Zack Rusin
On Tue, May 21, 2024 at 2:47 PM Ian Forbes  wrote:
>
> Fixes a bug where modes that are too large for the device are exposed
> and set causing a black screen on boot.
>
> v2: Fixed llvmpipe over-alignment bug.
> v3: Fix comment formatting.
>
> Ian Forbes (4):
>   drm/vmwgfx: Filter modes which exceed graphics memory
>   drm/vmwgfx: 3D disabled should not effect STDU memory limits
>   drm/vmwgfx: Remove STDU logic from generic mode_valid function
>   drm/vmwgfx: Standardize use of kibibytes when logging
>
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   | 19 +++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |  3 --
>  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c |  4 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c   | 26 ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c  | 45 ++-
>  5 files changed, 60 insertions(+), 37 deletions(-)

Looks great. For the series:
Reviewed-by: Zack Rusin 

z


Re: dma-buf sg mangling

2024-05-15 Thread Zack Rusin
On Tue, May 14, 2024 at 3:00 AM Christian König
 wrote:
>
> Am 14.05.24 um 06:15 schrieb Zack Rusin:
>
> On Mon, May 13, 2024 at 1:09 PM Christian König
>  wrote:
>
> Am 10.05.24 um 18:34 schrieb Zack Rusin:
>
> Hey,
>
> so this is a bit of a silly problem but I'd still like to solve it
> properly. The tldr is that virtualized drivers abuse
> drm_driver::gem_prime_import_sg_table (at least vmwgfx and xen do,
> virtgpu and xen punt on it) because there doesn't seem to be a
> universally supported way of converting the sg_table back to a list of
> pages without some form of gart to do it.
>
> Well the whole point is that you should never touch the pages in the
> sg_table in the first place.
>
> The long term plan is actually to completely remove the pages from that
> interface.
>
> First let me clarify that I'm not arguing for access to those pages.
> What I'd like to figure out are precise semantics for all of this
> prime import/map business on drivers that don't have some dedicated
> hardware to turn dma_addr_t array into something readable. If the
> general consensus is that those devices are broken, so be it.
>
>
> Well that stuff is actually surprisingly well documented, see here: 
> https://docs.kernel.org/driver-api/dma-buf.html#cpu-access-to-dma-buffer-objects
>
> It's just that the documentation is written from the perspective of the 
> exporter and userspace, so it's probably not that easy to understand what you 
> should do as an importer.
>
> Maybe we should add a sentence or two to clarify this.
>
> drm_prime_sg_to_page_array is deprecated (for all the right reasons on
> actual hardware) but in our cooky virtualized world we don't have
> gart's so what are we supposed to do with the dma_addr_t from the
> imported sg_table? What makes it worse (and definitely breaks xen) is
> that with CONFIG_DMABUF_DEBUG the sg page_link is mangled via
> mangle_sg_table so drm_prime_sg_to_page_array won't even work.
>
> XEN and KVM were actually adjusted to not touch the struct pages any more.
>
> I'm not sure if that work is already upstream or not but I had to
> explain it over and over again why their approach doesn't work.
>
> I'd love to see those patches. Upstream xen definitely still uses it:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/xen/xen_drm_front_gem.c#n263
> which looks pretty broken to me, especially with CONFIG_DMABUF_DEBUG
> because drm_gem_prime_import
> will call dma_buf_map_attachment_unlocked:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_prime.c#n940
> which will call __map_dma_buf
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma-buf/dma-buf.c#n1131
> which will mangle the sg's page_list before calling xen's
> gem_prime_import_sg_table. Which means the drm_prime_sg_to_page_array
> that's used in xen's gem_prime_import_sg_table is silently generating
> broken pages and the entire thing should just be a kernel oops (btw,
> it'd probably be a good idea to not have drm_prime_sg_to_page_array
> generate garbage with CONFIG_DMABUF_DEBUG and print some kind of a
> warning).
>
>
> I honestly didn't followed the discussion to the end, but both Sima and me 
> pointed that out to the XEN people and there were quite a bit of back and 
> forth how to fix it.
>
> Let me try to dig that up.
>
>
> The reason why I'm saying it's a bit of a silly problem is that afaik
> currently it only affects IGT testing with vgem (because the rest of
> external gem objects will be from the virtualized gpu itself which is
> different). But do you have any ideas on what we'd like to do with
> this long term? i.e. we have a virtualized gpus without iommu, we have
> sg_table with some memory and we'd like to import it. Do we just
> assume that the sg_table on those configs will always reference cpu
> accessible memory (i.e. if it's external it only comes through
> drm_gem_shmem_object) and just do some horrific abomination like:
> for (i = 0; i < bo->ttm->num_pages; ++i) {
>  phys_addr_t pa = dma_to_phys(vmw->drm.dev, bo->ttm->dma_address[i]);
>  pages[i] = pfn_to_page(PHYS_PFN(pa));
> }
> or add a "i know this is cpu accessible, please demangle" flag to
> drm_prime_sg_to_page_array or try to have some kind of more permanent
> solution?
>
> Well there is no solution for that. Accessing the underlying struct page
> through the sg_table is illegal in the first place.
>
> So the question is not how to access the struct page, but rather why do
> you want to

Re: dma-buf sg mangling

2024-05-13 Thread Zack Rusin
On Mon, May 13, 2024 at 1:09 PM Christian König
 wrote:
>
> Am 10.05.24 um 18:34 schrieb Zack Rusin:
> > Hey,
> >
> > so this is a bit of a silly problem but I'd still like to solve it
> > properly. The tldr is that virtualized drivers abuse
> > drm_driver::gem_prime_import_sg_table (at least vmwgfx and xen do,
> > virtgpu and xen punt on it) because there doesn't seem to be a
> > universally supported way of converting the sg_table back to a list of
> > pages without some form of gart to do it.
>
> Well the whole point is that you should never touch the pages in the
> sg_table in the first place.
>
> The long term plan is actually to completely remove the pages from that
> interface.

First let me clarify that I'm not arguing for access to those pages.
What I'd like to figure out are precise semantics for all of this
prime import/map business on drivers that don't have some dedicated
hardware to turn dma_addr_t array into something readable. If the
general consensus is that those devices are broken, so be it.

> > drm_prime_sg_to_page_array is deprecated (for all the right reasons on
> > actual hardware) but in our cooky virtualized world we don't have
> > gart's so what are we supposed to do with the dma_addr_t from the
> > imported sg_table? What makes it worse (and definitely breaks xen) is
> > that with CONFIG_DMABUF_DEBUG the sg page_link is mangled via
> > mangle_sg_table so drm_prime_sg_to_page_array won't even work.
>
> XEN and KVM were actually adjusted to not touch the struct pages any more.
>
> I'm not sure if that work is already upstream or not but I had to
> explain it over and over again why their approach doesn't work.

I'd love to see those patches. Upstream xen definitely still uses it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/xen/xen_drm_front_gem.c#n263
which looks pretty broken to me, especially with CONFIG_DMABUF_DEBUG
because drm_gem_prime_import
will call dma_buf_map_attachment_unlocked:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_prime.c#n940
which will call __map_dma_buf
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma-buf/dma-buf.c#n1131
which will mangle the sg's page_list before calling xen's
gem_prime_import_sg_table. Which means the drm_prime_sg_to_page_array
that's used in xen's gem_prime_import_sg_table is silently generating
broken pages and the entire thing should just be a kernel oops (btw,
it'd probably be a good idea to not have drm_prime_sg_to_page_array
generate garbage with CONFIG_DMABUF_DEBUG and print some kind of a
warning).

> > The reason why I'm saying it's a bit of a silly problem is that afaik
> > currently it only affects IGT testing with vgem (because the rest of
> > external gem objects will be from the virtualized gpu itself which is
> > different). But do you have any ideas on what we'd like to do with
> > this long term? i.e. we have a virtualized gpus without iommu, we have
> > sg_table with some memory and we'd like to import it. Do we just
> > assume that the sg_table on those configs will always reference cpu
> > accessible memory (i.e. if it's external it only comes through
> > drm_gem_shmem_object) and just do some horrific abomination like:
> > for (i = 0; i < bo->ttm->num_pages; ++i) {
> >  phys_addr_t pa = dma_to_phys(vmw->drm.dev, bo->ttm->dma_address[i]);
> >  pages[i] = pfn_to_page(PHYS_PFN(pa));
> > }
> > or add a "i know this is cpu accessible, please demangle" flag to
> > drm_prime_sg_to_page_array or try to have some kind of more permanent
> > solution?
>
> Well there is no solution for that. Accessing the underlying struct page
> through the sg_table is illegal in the first place.
>
> So the question is not how to access the struct page, but rather why do
> you want to do this?

Rob mentioned one usecase, but to be honest, as I mentioned in the
beginning I'd like to have a semantic clarity to the general problem
of going from dma_addr_t to something readable on non iomem resources,
e.g. get the IGT vgem<->vmwgfx tests running, i.e.:
vgem_handle = dumb_buffer_create(vgem_fd, );
dma_buf_fd = prime_handle_to_fd(vgem_fd, vgem_handle);
vmw_handle = prime_fd_to_handle(vmw_fd, dma_buf_fd);
void *ptr = vmw_map_bo(vmw_fd, vmw_handle, ...); <- crash

trying to map that bo will crash because we'll endup in
ttm_bo_vm_fault_reserved which will check whether
bo->resource->bus.is_iomem, which it won't be because every vmwgfx
buffer is just system memory and it will try to access the ttm pages
which don't ex

dma-buf sg mangling

2024-05-10 Thread Zack Rusin
Hey,

so this is a bit of a silly problem but I'd still like to solve it
properly. The tldr is that virtualized drivers abuse
drm_driver::gem_prime_import_sg_table (at least vmwgfx and xen do,
virtgpu and xen punt on it) because there doesn't seem to be a
universally supported way of converting the sg_table back to a list of
pages without some form of gart to do it.

drm_prime_sg_to_page_array is deprecated (for all the right reasons on
actual hardware) but in our cooky virtualized world we don't have
gart's so what are we supposed to do with the dma_addr_t from the
imported sg_table? What makes it worse (and definitely breaks xen) is
that with CONFIG_DMABUF_DEBUG the sg page_link is mangled via
mangle_sg_table so drm_prime_sg_to_page_array won't even work.

The reason why I'm saying it's a bit of a silly problem is that afaik
currently it only affects IGT testing with vgem (because the rest of
external gem objects will be from the virtualized gpu itself which is
different). But do you have any ideas on what we'd like to do with
this long term? i.e. we have a virtualized gpus without iommu, we have
sg_table with some memory and we'd like to import it. Do we just
assume that the sg_table on those configs will always reference cpu
accessible memory (i.e. if it's external it only comes through
drm_gem_shmem_object) and just do some horrific abomination like:
for (i = 0; i < bo->ttm->num_pages; ++i) {
phys_addr_t pa = dma_to_phys(vmw->drm.dev, bo->ttm->dma_address[i]);
pages[i] = pfn_to_page(PHYS_PFN(pa));
}
or add a "i know this is cpu accessible, please demangle" flag to
drm_prime_sg_to_page_array or try to have some kind of more permanent
solution?

z


Re: [PATCH] drm/vmwgfx: Re-introduce drm_crtc_helper_funcs::prepare

2024-05-03 Thread Zack Rusin
On Fri, May 3, 2024 at 6:29 PM Ian Forbes  wrote:
>
> This function was removed in the referenced fixes commit and caused a
> regression. This is because the presence of this function, even though it
> is a noop, changes the behaviour of disable_outputs in
> drm_atomic_helper.c:1211.
>
> Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation")
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> index 2041c4d48daa..37223f95cbec 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> @@ -409,6 +409,10 @@ static void vmw_stdu_crtc_mode_set_nofb(struct drm_crtc 
> *crtc)
>   crtc->x, crtc->y);
>  }
>
> +static void vmw_stdu_crtc_helper_prepare(struct drm_crtc *crtc)
> +{
> +}
> +
>  static void vmw_stdu_crtc_atomic_disable(struct drm_crtc *crtc,
>  struct drm_atomic_state *state)
>  {
> @@ -1463,6 +1467,7 @@ drm_plane_helper_funcs 
> vmw_stdu_primary_plane_helper_funcs = {
>  };
>
>  static const struct drm_crtc_helper_funcs vmw_stdu_crtc_helper_funcs = {
> +   .prepare = vmw_stdu_crtc_helper_prepare,
> .mode_set_nofb = vmw_stdu_crtc_mode_set_nofb,
> .atomic_check = vmw_du_crtc_atomic_check,
> .atomic_begin = vmw_du_crtc_atomic_begin,
> --
> 2.34.1
>

Thanks, but that doesn't look correct. We do want to make sure the
drm_crtc_vblank_off is actually called when outputs are disabled.
Since this is my regression it's perfectly fine if you want to hand it
off to me and work on something else. In general you always want to
understand what the patch that you're sending is doing before sending
it. In this case it's pretty trivial, the commit you mention says that
it fixes kms_pipe_crc_basic and if you run it with your patch (e.g.
sudo ./kms_pipe_crc_basic --run-subtest disable-crc-after-crtc) you
should notice:
May 03 22:25:05 fedora.local kernel: [ cut here ]
May 03 22:25:05 fedora.local kernel: driver forgot to call drm_crtc_vblank_off()
May 03 22:25:05 fedora.local kernel: WARNING: CPU: 2 PID: 2204 at
drivers/gpu/drm/drm_atomic_helper.c:1232 disable_outputs+0x345/0x350
May 03 22:25:05 fedora.local kernel: Modules linked in: snd_seq_dummy
snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables
snd_seq_midi snd_seq_midi_event qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
sunrpc binfmt_misc snd_ens1371 snd_ac97_codec ac97_bus snd_seq
intel_rapl_msr snd_pcm intel_rapl_common vmw_balloon
intel_uncore_frequency_common isst_if_mbox_msr isst_if_common gameport
snd_rawmidi snd_seq_device rapl snd_timer snd vmw_vmci pcspkr
soundcore i2c_piix4 pktcdvd joydev loop nfnetlink zram vmwgfx
crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
polyval_generic nvme ghash_clmulni_intel nvme_core sha512_ssse3
sha256_ssse3 sha1_ssse3 drm_ttm_helper ttm nvme_auth vmxnet3 serio_raw
ata_generic pata_acpi fuse i2c_dev
May 03 22:25:05 fedora.local kernel: CPU: 2 PID: 2204 Comm:
kms_pipe_crc_ba Not tainted 6.9.0-rc2-vmwgfx #5
May 03 22:25:05 fedora.local kernel: Hardware name: VMware, Inc.
VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
11/12/2020
May 03 22:25:05 fedora.local kernel: RIP: 0010:disable_outputs+0x345/0x350
... but in most cases it's not going to be so trivial. Whether you
decide to work on this one yourself or hand it off to me - we don't
want to trade bug for bug here, but fix both of those things.

z


Re: [PATCH] drm/vmwgfx: Stop using dev_private to store driver data.

2024-05-01 Thread Zack Rusin
On Wed, May 1, 2024 at 8:41 PM Maaz Mombasawala
 wrote:
>
> Currently vmwgfx uses the dev_private opaque pointer in drm_device to store
> driver data in vmw_private struct. Using dev_private is deprecated, and the
> recommendation is to embed struct drm_device in the larger per-device
> structure.
>
> The vmwgfx driver already embeds struct drm_device in its struct
> vmw_private, so switch to using that exclusively and stop using
> dev_private.
>
> Signed-off-by: Maaz Mombasawala 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 --
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 4 ++--
>  3 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index bdad93864b98..97e48e93dbbf 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -858,8 +858,6 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
> u32 pci_id)
> bool refuse_dma = false;
> struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev);
>
> -   dev_priv->drm.dev_private = dev_priv;
> -
> vmw_sw_context_init(dev_priv);
>
> mutex_init(&dev_priv->cmdbuf_mutex);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> index 4ecaea0026fc..df89e468a1fc 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> @@ -638,7 +638,7 @@ static inline struct vmw_surface *vmw_res_to_srf(struct 
> vmw_resource *res)
>
>  static inline struct vmw_private *vmw_priv(struct drm_device *dev)
>  {
> -   return (struct vmw_private *)dev->dev_private;
> +   return container_of(dev, struct vmw_private, drm);
>  }
>
>  static inline struct vmw_private *vmw_priv_from_ttm(struct ttm_device *bdev)
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 13b2820cae51..b3f0fb6828de 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -276,7 +276,7 @@ static void vmw_du_put_cursor_mob(struct vmw_cursor_plane 
> *vcp,
>  static int vmw_du_get_cursor_mob(struct vmw_cursor_plane *vcp,
>  struct vmw_plane_state *vps)
>  {
> -   struct vmw_private *dev_priv = vcp->base.dev->dev_private;
> +   struct vmw_private *dev_priv = vmw_priv(vcp->base.dev);
> u32 size = vmw_du_cursor_mob_size(vps->base.crtc_w, vps->base.crtc_h);
> u32 i;
> u32 cursor_max_dim, mob_max_size;
> @@ -515,7 +515,7 @@ void vmw_du_cursor_plane_destroy(struct drm_plane *plane)
> struct vmw_cursor_plane *vcp = vmw_plane_to_vcp(plane);
> u32 i;
>
> -   vmw_cursor_update_position(plane->dev->dev_private, false, 0, 0);
> +   vmw_cursor_update_position(vmw_priv(plane->dev), false, 0, 0);
>
> for (i = 0; i < ARRAY_SIZE(vcp->cursor_mobs); i++)
> vmw_du_destroy_cursor_mob(&vcp->cursor_mobs[i]);
> --
> 2.34.1
>

Looks good.
Reviewed-by: Zack Rusin 

z


Re: [PATCH] drm/vmwgfx: Remove duplicate vmwgfx_vkms.h header

2024-04-30 Thread Zack Rusin
On Tue, Apr 16, 2024 at 9:29 PM Jiapeng Chong
 wrote:
>
> ./drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c: vmwgfx_vkms.h is included more than 
> once.
>
> Reported-by: Abaci Robot 
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8772
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> index 7e93a45948f7..3bfcf671fcd5 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
> @@ -31,7 +31,6 @@
>  #include "vmwgfx_bo.h"
>  #include "vmwgfx_drv.h"
>  #include "vmwgfx_kms.h"
> -#include "vmwgfx_vkms.h"
>
>  #include "vmw_surface_cache.h"
>
> --
> 2.20.1.7.g153144c

Thanks. I pushed it to drm-misc-next.

z


Re: [PATCH] drm/vmwgfx: Fix Legacy Display Unit

2024-04-25 Thread Zack Rusin
On Thu, Apr 25, 2024 at 4:07 PM Ian Forbes  wrote:
>
> Legacy DU was broken by the referenced fixes commit because the placement
> and the busy_placement no longer pointed to the same object. This was later
> fixed indirectly by commit a78a8da51b36c7a0c0c16233f91d60aac03a5a49
> ("drm/ttm: replace busy placement with flags v6") in v6.9.
>
> Fixes: 39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
> Signed-off-by: Ian Forbes 
> Cc:  # v6.4+
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
> index 2bfac3aad7b7..98e73eb0ccf1 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
> @@ -204,6 +204,7 @@ int vmw_bo_pin_in_start_of_vram(struct vmw_private 
> *dev_priv,
>  VMW_BO_DOMAIN_VRAM,
>  VMW_BO_DOMAIN_VRAM);
> buf->places[0].lpfn = PFN_UP(bo->resource->size);
> +   buf->busy_places[0].lpfn = PFN_UP(bo->resource->size);
> ret = ttm_bo_validate(bo, &buf->placement, &ctx);
>
>     /* For some reason we didn't end up at the start of vram */

Looks great. I'll push it through drm-misc-fixes.
Reviewed-by: Zack Rusin 

z


[PATCH] drm/vmwgfx: Fix invalid reads in fence signaled events

2024-04-25 Thread Zack Rusin
Correctly set the length of the drm_event to the size of the structure
that's actually used.

The length of the drm_event was set to the parent structure instead of
to the drm_vmw_event_fence which is supposed to be read. drm_read
uses the length parameter to copy the event to the user space thus
resuling in oob reads.

Signed-off-by: Zack Rusin 
Fixes: 8b7de6aa8468 ("vmwgfx: Rework fence event action")
Reported-by: zdi-disclosu...@trendmicro.com # ZDI-CAN-23566
Cc: David Airlie 
CC: Daniel Vetter 
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc: linux-ker...@vger.kernel.org
Cc:  # v3.4+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 2a0cda324703..5efc6a766f64 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -991,7 +991,7 @@ static int vmw_event_fence_action_create(struct drm_file 
*file_priv,
}
 
event->event.base.type = DRM_VMW_EVENT_FENCE_SIGNALED;
-   event->event.base.length = sizeof(*event);
+   event->event.base.length = sizeof(event->event);
event->event.user_data = user_data;
 
ret = drm_event_reserve_init(dev, file_priv, &event->base, 
&event->event.base);
-- 
2.40.1



Re: [PATCH] drm/ttm: Print the memory decryption status just once

2024-04-18 Thread Zack Rusin
Ping on this one. If we don't want the "_once" then I can quickly
prepare a patch that just removes the logging altogether, because
while useful it's polluting up the kernel logs too much right now so
getting a fix in for 6.9 for this would be great.

z

On Mon, Apr 8, 2024 at 1:46 PM Zack Rusin  wrote:
>
> Sorry, apologies to everyone. By accident I replied off the list.
> Redoing it now on the list. More below.
>
> On Mon, Apr 8, 2024 at 12:10 PM Christian König
>  wrote:
> >
> > Am 08.04.24 um 18:04 schrieb Zack Rusin:
> > > On Mon, Apr 8, 2024 at 11:59 AM Christian König
> > >  wrote:
> > >> Am 08.04.24 um 17:56 schrieb Zack Rusin:
> > >>> Stop printing the TT memory decryption status info each time tt is 
> > >>> created
> > >>> and instead print it just once.
> > >>>
> > >>> Reduces the spam in the system logs when running guests with SEV 
> > >>> enabled.
> > >> Do we then really need this in the first place?
> > > Thomas asked for it just to have an indication when those paths are
> > > being used because they could potentially break things pretty bad. I
> > > think it is useful knowing that those paths are hit (but only once).
> > > It makes it pretty easy for me to tell whether bug reports with people
> > > who report black screen can be answered with "the kernel needs to be
> > > upgraded" ;)
> >
> > Sounds reasonable, but my expectation was rather that we would print
> > something on the device level.
> >
> > If that's not feasible for whatever reason than printing it once works
> > as well of course.
>
> TBH, I think it's pretty convenient to have the drm_info in the TT
> just to make sure that when drivers request use_dma_alloc on SEV
> systems TT turns decryption on correctly, i.e. it's a nice sanity
> check when reading the logs. But if you'd prefer it in the driver I
> can move this logic there as well.
>
> z


Re: [PATCH 0/4] Fix memory limits for STDU

2024-04-11 Thread Zack Rusin
On Thu, Apr 11, 2024 at 5:27 PM Ian Forbes  wrote:
>
> Fixes a bug where modes that are too large for the device are exposed
> and set causing a black screen on boot.
>
> Resending as Patchwork did not like my last submission.
>
> Ian Forbes (4):
>   drm/vmwgfx: Filter modes which exceed graphics memory
>   drm/vmwgfx: 3D disabled should not effect STDU memory limits
>   drm/vmwgfx: Remove STDU logic from generic mode_valid function
>   drm/vmwgfx: Standardize use of kibibytes when logging
>
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   | 19 ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |  3 --
>  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c |  4 +--
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c   | 26 ++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c  | 32 ++-
>  5 files changed, 48 insertions(+), 36 deletions(-)
>

In general that looks great! Two questions:
- with stdu what happens when the mode selected is close to our
limits, the guest is using a hardware cursor and we allocate cursor
mobs?
- with legacy du, is general mode selection with modes close to vram
size working?

And one comment: in series like those, be careful with fixes tags if
the patches depend on each other, i.e. the third one depends on the
first but they have different fixes tags so they're disconnected. It's
a good idea to keep distro kernel maintainers in mind with those and
try to organize the patches in a way that makes it a bit more clearer
that #3 depends on #1. It should be fine in this case though.

z


[PATCH v2 5/5] drm/vmwgfx: Sort primary plane formats by order of preference

2024-04-11 Thread Zack Rusin
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.

Sort the primary plane formats according to our order of preference.

Nice side-effect of this change is that it makes IGT's kms_atomic
plane-invalid-params pass because the test picks the first format
which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
which make Pixman, which IGT depends on assert due to the fact that our
16bpp formats aren't 32 bit aligned like Pixman requires all formats
to be.

Signed-off-by: Zack Rusin 
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v4.12+
Acked-by: Pekka Paalanen 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index bf9931e3a728..bf24f2f0dcfc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -233,10 +233,10 @@ struct vmw_framebuffer_bo {
 
 
 static const uint32_t __maybe_unused vmw_primary_plane_formats[] = {
-   DRM_FORMAT_XRGB1555,
-   DRM_FORMAT_RGB565,
DRM_FORMAT_XRGB,
DRM_FORMAT_ARGB,
+   DRM_FORMAT_RGB565,
+   DRM_FORMAT_XRGB1555,
 };
 
 static const uint32_t __maybe_unused vmw_cursor_plane_formats[] = {
-- 
2.40.1



[PATCH v2 0/5] drm/vmwgfx: vblank and crc generation support

2024-04-11 Thread Zack Rusin
vmwgfx didn't have support for vblank or crc generation which made it
impossible to use a large number of IGT tests to properly test DRM
functionality in the driver.

This series add virtual vblank and crc generation support, which allows
running most of IGT and immediately helped fix a number of kms issues
in the driver.

v2: Fix misspelled comment header found by the kernel test robot, a style
fix spotted by Martin and improve commit message in 5/5 as suggested
by Pekka.

Zack Rusin (5):
  drm/vmwgfx: Implement virtual kms
  drm/vmwgfx: Implement virtual crc generation
  drm/vmwgfx: Fix prime import/export
  drm/vmwgfx: Fix crtc's atomic check conditional
  drm/vmwgfx: Sort primary plane formats by order of preference

 drivers/gpu/drm/vmwgfx/Makefile|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c   |  35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |   7 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|   5 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|   7 +
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  32 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c|  51 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  26 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  39 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  28 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   |  42 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  44 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   | 632 +
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h   |  75 +++
 17 files changed, 965 insertions(+), 109 deletions(-)
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h

-- 
2.40.1



[PATCH v2 4/5] drm/vmwgfx: Fix crtc's atomic check conditional

2024-04-11 Thread Zack Rusin
The conditional was supposed to prevent enabling of a crtc state
without a set primary plane. Accidently it also prevented disabling
crtc state with a set primary plane. Neither is correct.

Fix the conditional and just driver-warn when a crtc state has been
enabled without a primary plane which will help debug broken userspace.

Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.

Signed-off-by: Zack Rusin 
Fixes: 06ec41909e31 ("drm/vmwgfx: Add and connect CRTC helper functions")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v4.12+
Reviewed-by: Ian Forbes 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e33e5993d8fc..13b2820cae51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -931,6 +931,7 @@ int vmw_du_cursor_plane_atomic_check(struct drm_plane 
*plane,
 int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
 struct drm_atomic_state *state)
 {
+   struct vmw_private *vmw = vmw_priv(crtc->dev);
struct drm_crtc_state *new_state = drm_atomic_get_new_crtc_state(state,
 crtc);
struct vmw_display_unit *du = vmw_crtc_to_du(new_state->crtc);
@@ -938,9 +939,13 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
bool has_primary = new_state->plane_mask &
   drm_plane_mask(crtc->primary);
 
-   /* We always want to have an active plane with an active CRTC */
-   if (has_primary != new_state->enable)
-   return -EINVAL;
+   /*
+* This is fine in general, but broken userspace might expect
+* some actual rendering so give a clue as why it's blank.
+*/
+   if (new_state->enable && !has_primary)
+   drm_dbg_driver(&vmw->drm,
+  "CRTC without a primary plane will be blank.\n");
 
 
if (new_state->connector_mask != connector_mask &&
-- 
2.40.1



[PATCH v2 3/5] drm/vmwgfx: Fix prime import/export

2024-04-11 Thread Zack Rusin
vmwgfx never supported prime import of external buffers. Furthermore the
driver exposes two different objects to userspace: vmw_surface's and
gem buffers but prime import/export only worked with vmw_surfaces.

Because gem buffers are used through the dumb_buffer interface this meant
that the driver created buffers couldn't have been prime exported or
imported.

Fix prime import/export. Makes IGT's kms_prime pass.

Signed-off-by: Zack Rusin 
Fixes: 8afa13a0583f ("drm/vmwgfx: Implement DRIVER_GEM")
Cc:  # v6.6+
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c   | 35 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  7 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|  1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  3 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c| 32 
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  | 15 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 44 +++---
 8 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index c52c7bf1485b..717d624e9a05 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -456,8 +456,10 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
.no_wait_gpu = false
};
u32 j, initial_line = dst_offset / dst_stride;
-   struct vmw_bo_blit_line_data d;
+   struct vmw_bo_blit_line_data d = {0};
int ret = 0;
+   struct page **dst_pages = NULL;
+   struct page **src_pages = NULL;
 
/* Buffer objects need to be either pinned or reserved: */
if (!(dst->pin_count))
@@ -477,12 +479,35 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
return ret;
}
 
+   if (!src->ttm->pages && src->ttm->sg) {
+   src_pages = kvmalloc_array(src->ttm->num_pages,
+  sizeof(struct page *), GFP_KERNEL);
+   if (!src_pages)
+   return -ENOMEM;
+   ret = drm_prime_sg_to_page_array(src->ttm->sg, src_pages,
+src->ttm->num_pages);
+   if (ret)
+   goto out;
+   }
+   if (!dst->ttm->pages && dst->ttm->sg) {
+   dst_pages = kvmalloc_array(dst->ttm->num_pages,
+  sizeof(struct page *), GFP_KERNEL);
+   if (!dst_pages) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   ret = drm_prime_sg_to_page_array(dst->ttm->sg, dst_pages,
+dst->ttm->num_pages);
+   if (ret)
+   goto out;
+   }
+
d.mapped_dst = 0;
d.mapped_src = 0;
d.dst_addr = NULL;
d.src_addr = NULL;
-   d.dst_pages = dst->ttm->pages;
-   d.src_pages = src->ttm->pages;
+   d.dst_pages = dst->ttm->pages ? dst->ttm->pages : dst_pages;
+   d.src_pages = src->ttm->pages ? src->ttm->pages : src_pages;
d.dst_num_pages = PFN_UP(dst->resource->size);
d.src_num_pages = PFN_UP(src->resource->size);
d.dst_prot = ttm_io_prot(dst, dst->resource, PAGE_KERNEL);
@@ -504,6 +529,10 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
kunmap_atomic(d.src_addr);
if (d.dst_addr)
kunmap_atomic(d.dst_addr);
+   if (src_pages)
+   kvfree(src_pages);
+   if (dst_pages)
+   kvfree(dst_pages);
 
return ret;
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index bfd41ce3c8f4..e5eb21a471a6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -377,7 +377,8 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
 {
struct ttm_operation_ctx ctx = {
.interruptible = params->bo_type != ttm_bo_type_kernel,
-   .no_wait_gpu = false
+   .no_wait_gpu = false,
+   .resv = params->resv,
};
struct ttm_device *bdev = &dev_priv->bdev;
struct drm_device *vdev = &dev_priv->drm;
@@ -394,8 +395,8 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
 
vmw_bo_placement_set(vmw_bo, params->domain, params->busy_domain);
ret = ttm_bo_init_reserved(bdev, &vmw_bo->tbo, params->bo_type,
-  &vmw_bo->placement, 0, &ctx, NULL,
-  NULL, destroy);
+  &vmw_bo->placement, 0, &ctx,
+  params->

[PATCH v2 2/5] drm/vmwgfx: Implement virtual crc generation

2024-04-11 Thread Zack Rusin
crc checksums are used to validate the output. Normally they're part
of the actual display hardware but on virtual stack there's nothing
to automatically generate them.

Implement crc generation for the vmwgfx stack. This works only on
screen targets, where it's possibly to easily make sure that the
guest side contents of the surface matches the host sides output.

Just like the vblank support, crc generation can only be enabled via:
guestinfo.vmwgfx.vkms_enable = "TRUE"
option in the vmx file.

Makes IGT's kms_pipe_crc_basic pass and allows a huge number of other
IGT tests which require CRC generation of the output to actually run
on vmwgfx. Makes it possible to actually validate a lof of the kms and
drm functionality with vmwgfx.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  |  31 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  22 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 457 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h |  28 +-
 8 files changed, 553 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index e34c48fd25d4..89d3679d2608 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1198,6 +1198,7 @@ static void vmw_driver_unload(struct drm_device *dev)
 
vmw_svga_disable(dev_priv);
 
+   vmw_vkms_cleanup(dev_priv);
vmw_kms_close(dev_priv);
vmw_overlay_close(dev_priv);
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 4f5d7d13c4aa..ddbceaa31b59 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -616,6 +616,7 @@ struct vmw_private {
uint32 *devcaps;
 
bool vkms_enabled;
+   struct workqueue_struct *crc_workq;
 
/*
 * mksGuestStat instance-descriptor and pid arrays
@@ -811,6 +812,7 @@ void vmw_resource_mob_attach(struct vmw_resource *res);
 void vmw_resource_mob_detach(struct vmw_resource *res);
 void vmw_resource_dirty_update(struct vmw_resource *res, pgoff_t start,
   pgoff_t end);
+int vmw_resource_clean(struct vmw_resource *res);
 int vmw_resources_clean(struct vmw_bo *vbo, pgoff_t start,
pgoff_t end, pgoff_t *num_prefault);
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e763cf0e6cfc..e33e5993d8fc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -40,14 +40,14 @@
 
 void vmw_du_init(struct vmw_display_unit *du)
 {
-   hrtimer_init(&du->vkms.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-   du->vkms.timer.function = &vmw_vkms_vblank_simulate;
+   vmw_vkms_crtc_init(&du->crtc);
 }
 
 void vmw_du_cleanup(struct vmw_display_unit *du)
 {
struct vmw_private *dev_priv = vmw_priv(du->primary.dev);
-   hrtimer_cancel(&du->vkms.timer);
+
+   vmw_vkms_crtc_cleanup(&du->crtc);
drm_plane_cleanup(&du->primary);
if (vmw_cmd_supported(dev_priv))
drm_plane_cleanup(&du->cursor.base);
@@ -963,6 +963,7 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
 void vmw_du_crtc_atomic_begin(struct drm_crtc *crtc,
  struct drm_atomic_state *state)
 {
+   vmw_vkms_crtc_atomic_begin(crtc, state);
 }
 
 /**
@@ -2029,6 +2030,29 @@ vmw_kms_create_hotplug_mode_update_property(struct 
vmw_private *dev_priv)
  "hotplug_mode_update", 0, 1);
 }
 
+static void
+vmw_atomic_commit_tail(struct drm_atomic_state *old_state)
+{
+   struct vmw_private *vmw = vmw_priv(old_state->dev);
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *old_crtc_state;
+   int i;
+
+   drm_atomic_helper_commit_tail(old_state);
+
+   if (vmw->vkms_enabled) {
+   for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) {
+   struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
+   (void)old_crtc_state;
+   flush_work(&du->vkms.crc_generator_work);
+   }
+   }
+}
+
+static const struct drm_mode_config_helper_funcs vmw_mode_config_helpers = {
+   .atomic_commit_tail = vmw_atomic_commit_tail,
+};
+
 int vmw_kms_init(struct vmw_private *dev_priv)
 {
struct drm_device *dev = &dev_priv->drm;
@@ -2048,6 +2072,7 @@ int vmw_kms_init(struct vmw_private *dev_priv)
dev->mode_config.max_width = dev_priv->texture_max_width;
dev->mode_config.max_height = dev_priv->texture_max_height;
dev->mod

[PATCH v2 1/5] drm/vmwgfx: Implement virtual kms

2024-04-11 Thread Zack Rusin
By default vmwgfx doesn't support vblanking or crc generation which
makes it impossible to use various IGT tests to validate vmwgfx.
Implement virtual kernel mode setting, which is mainly related to
simulated vblank support.

Code is very similar to amd's vkms and the vkms module itself, except
that it's integrated with vmwgfx three different output technologies -
legacy, screen object and screen targets.

Make IGT's kms_vblank pass on vmwgfx and allows a lot of other IGT
tests to run with vmwgfx.

Support for vkms needs to be manually enabled by adding:
guestinfo.vmwgfx.vkms_enable = "TRUE"
somewhere in the vmx file, otherwise it's off by default.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/Makefile  |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   3 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  |  15 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |   9 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  |  39 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c |  28 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  22 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 193 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h |  53 
 10 files changed, 302 insertions(+), 64 deletions(-)
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h

diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
index e94479d9cd5b..46a4ab688a7f 100644
--- a/drivers/gpu/drm/vmwgfx/Makefile
+++ b/drivers/gpu/drm/vmwgfx/Makefile
@@ -10,6 +10,6 @@ vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_kms.o 
vmwgfx_drv.o \
vmwgfx_simple_resource.o vmwgfx_va.o vmwgfx_blit.o \
vmwgfx_validation.o vmwgfx_page_dirty.o vmwgfx_streamoutput.o \
vmwgfx_devcaps.o ttm_object.o vmwgfx_system_manager.o \
-   vmwgfx_gem.o
+   vmwgfx_gem.o vmwgfx_vkms.o
 
 obj-$(CONFIG_DRM_VMWGFX) := vmwgfx.o
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index c7d90f96d16a..e34c48fd25d4 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -32,6 +32,7 @@
 #include "vmwgfx_binding.h"
 #include "vmwgfx_devcaps.h"
 #include "vmwgfx_mksstat.h"
+#include "vmwgfx_vkms.h"
 #include "ttm_object.h"
 
 #include 
@@ -910,6 +911,8 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
 "Please switch to a supported graphics device to 
avoid problems.");
}
 
+   vmw_vkms_init(dev_priv);
+
ret = vmw_dma_select_mode(dev_priv);
if (unlikely(ret != 0)) {
drm_info(&dev_priv->drm,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 01f41fbb9c3b..4f5d7d13c4aa 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -615,6 +615,8 @@ struct vmw_private {
 
uint32 *devcaps;
 
+   bool vkms_enabled;
+
/*
 * mksGuestStat instance-descriptor and pid arrays
 */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 09214f9339b2..e763cf0e6cfc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -27,6 +27,7 @@
 #include "vmwgfx_kms.h"
 
 #include "vmwgfx_bo.h"
+#include "vmwgfx_vkms.h"
 #include "vmw_surface_cache.h"
 
 #include 
@@ -37,9 +38,16 @@
 #include 
 #include 
 
+void vmw_du_init(struct vmw_display_unit *du)
+{
+   hrtimer_init(&du->vkms.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   du->vkms.timer.function = &vmw_vkms_vblank_simulate;
+}
+
 void vmw_du_cleanup(struct vmw_display_unit *du)
 {
struct vmw_private *dev_priv = vmw_priv(du->primary.dev);
+   hrtimer_cancel(&du->vkms.timer);
drm_plane_cleanup(&du->primary);
if (vmw_cmd_supported(dev_priv))
drm_plane_cleanup(&du->cursor.base);
@@ -957,13 +965,6 @@ void vmw_du_crtc_atomic_begin(struct drm_crtc *crtc,
 {
 }
 
-
-void vmw_du_crtc_atomic_flush(struct drm_crtc *crtc,
- struct drm_atomic_state *state)
-{
-}
-
-
 /**
  * vmw_du_crtc_duplicate_state - duplicate crtc state
  * @crtc: DRM crtc
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index 4a2e3cac1c22..9e83a1553286 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -376,6 +376,12 @@ struct vmw_display_unit {
bool is_implicit;
int set_gui_x;
int set_gui_y;
+
+   struct {
+   struct hrtimer timer;
+   ktime_t period_ns;
+   struct drm_pending_vblank_event *event;
+   } vkms;
 };
 
 #define vmw_crtc_to_du(x) \
@@ -387,6 +393,7 @@ 

Re: [PATCH] drm/ttm: Print the memory decryption status just once

2024-04-08 Thread Zack Rusin
Sorry, apologies to everyone. By accident I replied off the list.
Redoing it now on the list. More below.

On Mon, Apr 8, 2024 at 12:10 PM Christian König
 wrote:
>
> Am 08.04.24 um 18:04 schrieb Zack Rusin:
> > On Mon, Apr 8, 2024 at 11:59 AM Christian König
> >  wrote:
> >> Am 08.04.24 um 17:56 schrieb Zack Rusin:
> >>> Stop printing the TT memory decryption status info each time tt is created
> >>> and instead print it just once.
> >>>
> >>> Reduces the spam in the system logs when running guests with SEV enabled.
> >> Do we then really need this in the first place?
> > Thomas asked for it just to have an indication when those paths are
> > being used because they could potentially break things pretty bad. I
> > think it is useful knowing that those paths are hit (but only once).
> > It makes it pretty easy for me to tell whether bug reports with people
> > who report black screen can be answered with "the kernel needs to be
> > upgraded" ;)
>
> Sounds reasonable, but my expectation was rather that we would print
> something on the device level.
>
> If that's not feasible for whatever reason than printing it once works
> as well of course.

TBH, I think it's pretty convenient to have the drm_info in the TT
just to make sure that when drivers request use_dma_alloc on SEV
systems TT turns decryption on correctly, i.e. it's a nice sanity
check when reading the logs. But if you'd prefer it in the driver I
can move this logic there as well.

z


[PATCH] drm/ttm: Print the memory decryption status just once

2024-04-08 Thread Zack Rusin
Stop printing the TT memory decryption status info each time tt is created
and instead print it just once.

Reduces the spam in the system logs when running guests with SEV enabled.

Signed-off-by: Zack Rusin 
Fixes: 71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when 
needed")
Cc: Thomas Hellström 
Cc: Christian König 
Cc: dri-devel@lists.freedesktop.org
Cc: linux-ker...@vger.kernel.org
Cc:  # v5.14+
---
 drivers/gpu/drm/ttm/ttm_tt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 578a7c37f00b..d776e3f87064 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -92,7 +92,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool 
zero_alloc)
 */
if (bdev->pool.use_dma_alloc && 
cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
page_flags |= TTM_TT_FLAG_DECRYPTED;
-   drm_info(ddev, "TT memory decryption enabled.");
+   drm_info_once(ddev, "TT memory decryption enabled.");
}
 
bo->ttm = bdev->funcs->ttm_tt_create(bo, page_flags);
-- 
2.40.1



[PATCH] drm/vmwgfx: Enable DMA mappings with SEV

2024-04-07 Thread Zack Rusin
Enable DMA mappings in vmwgfx after TTM has been fixed in commit
3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for page-based iomem")

This enables full guest-backed memory support and in particular allows
usage of screen targets as the presentation mechanism.

Signed-off-by: Zack Rusin 
Reported-by: Ye Li 
Tested-by: Ye Li 
Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is 
active")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.6+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 41ad13e45554..bdad93864b98 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -667,11 +667,12 @@ static int vmw_dma_select_mode(struct vmw_private 
*dev_priv)
[vmw_dma_map_populate] = "Caching DMA mappings.",
[vmw_dma_map_bind] = "Giving up DMA mappings early."};
 
-   /* TTM currently doesn't fully support SEV encryption. */
-   if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
-   return -EINVAL;
-
-   if (vmw_force_coherent)
+   /*
+* When running with SEV we always want dma mappings, because
+* otherwise ttm tt pool pages will bounce through swiotlb running
+* out of available space.
+*/
+   if (vmw_force_coherent || cc_platform_has(CC_ATTR_MEM_ENCRYPT))
dev_priv->map_mode = vmw_dma_alloc_coherent;
else if (vmw_restrict_iommu)
dev_priv->map_mode = vmw_dma_map_bind;
-- 
2.40.1



Re: [PATCH 1/5] drm/vmwgfx: Implement virtual kms

2024-04-05 Thread Zack Rusin
On Fri, Apr 5, 2024 at 5:53 PM Maaz Mombasawala
 wrote:
>
> On 4/2/24 16:28, Zack Rusin wrote:
> >
> > @@ -541,6 +518,8 @@ static int vmw_ldu_init(struct vmw_private *dev_priv, 
> > unsigned unit)
> >dev_priv->implicit_placement_property,
> >1);
> >
> > + vmw_du_init(&ldu->base);
> > +
> >   return 0;
> >
> >  err_free_unregister:
>
> > @@ -905,6 +900,9 @@ static int vmw_sou_init(struct vmw_private *dev_priv, 
> > unsigned unit)
> >  dev->mode_config.suggested_x_property, 0);
> >   drm_object_attach_property(&connector->base,
> >  dev->mode_config.suggested_y_property, 0);
> > +
> > + vmw_du_init(&sou->base);
> > +
> >   return 0;
> >
> >  err_free_unregister:
>
> > @@ -1575,6 +1576,9 @@ static int vmw_stdu_init(struct vmw_private 
> > *dev_priv, unsigned unit)
> >  dev->mode_config.suggested_x_property, 0);
> >   drm_object_attach_property(&connector->base,
> >  dev->mode_config.suggested_y_property, 0);
> > +
> > + vmw_du_init(&stdu->base);
> > +
> >   return 0;
> >
> >  err_free_unregister:
>
> Shouldn't calls to vmw_du_init() be behind an if(vkms_enabled) condition?

So the vmw_du_init is supposed to initialize the base, so that's
unconditional. To match the unconditional vmw_du_cleanup. There's an
argument to be made whether both of those should unconditionally  call
vmw_vkms_crtc_init and vmw_vkms_crtc_cleanup. My opinion was that
they're not doing anything costly and just initialize members and
having the members of vmw_display_unit initialized whether vkms is
enabled or not still makes sense.

z


Re: [PATCH 5/5] drm/vmwgfx: Sort primary plane formats by order of preference

2024-04-03 Thread Zack Rusin
On Wed, Apr 3, 2024 at 3:43 AM Pekka Paalanen
 wrote:
>
> On Tue,  2 Apr 2024 19:28:13 -0400
> Zack Rusin  wrote:
>
> > The table of primary plane formats wasn't sorted at all, leading to
> > applications picking our least desirable formats by defaults.
> >
> > Sort the primary plane formats according to our order of preference.
>
> This is good.
>
> > Fixes IGT's kms_atomic plane-invalid-params which assumes that the
> > preferred format is a 32bpp format.
>
> That sounds strange, why would IGT depend on preferred format being
> 32bpp?
>
> That must be an oversight. IGT cannot dictate the format that hardware
> must prefer. XRGB is strongly suggested to be supported in general,
> but why also preferred?

I think it's just a side-effect of the pixman's assert that's failing:
https://cgit.freedesktop.org/drm/igt-gpu-tools/tree/lib/igt_fb.c#n4190
i.e. pixman assumes everything is 4 byte aligned.
I should have rephrased the message as "IGT assumes that the preferred
fb format is 4 byte aligned because our 16bpp formats are packed and
pixman can't convert them".

z


[PATCH 4/5] drm/vmwgfx: Fix crtc's atomic check conditional

2024-04-02 Thread Zack Rusin
The conditional was supposed to prevent enabling of a crtc state
without a set primary plane. Accidently it also prevented disabling
crtc state with a set primary plane. Neither is correct.

Fix the conditional and just driver-warn when a crtc state has been
enabled without a primary plane which will help debug broken userspace.

Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.

Signed-off-by: Zack Rusin 
Fixes: 06ec41909e31 ("drm/vmwgfx: Add and connect CRTC helper functions")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v4.12+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e33e5993d8fc..13b2820cae51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -931,6 +931,7 @@ int vmw_du_cursor_plane_atomic_check(struct drm_plane 
*plane,
 int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
 struct drm_atomic_state *state)
 {
+   struct vmw_private *vmw = vmw_priv(crtc->dev);
struct drm_crtc_state *new_state = drm_atomic_get_new_crtc_state(state,
 crtc);
struct vmw_display_unit *du = vmw_crtc_to_du(new_state->crtc);
@@ -938,9 +939,13 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
bool has_primary = new_state->plane_mask &
   drm_plane_mask(crtc->primary);
 
-   /* We always want to have an active plane with an active CRTC */
-   if (has_primary != new_state->enable)
-   return -EINVAL;
+   /*
+* This is fine in general, but broken userspace might expect
+* some actual rendering so give a clue as why it's blank.
+*/
+   if (new_state->enable && !has_primary)
+   drm_dbg_driver(&vmw->drm,
+  "CRTC without a primary plane will be blank.\n");
 
 
if (new_state->connector_mask != connector_mask &&
-- 
2.40.1



[PATCH 5/5] drm/vmwgfx: Sort primary plane formats by order of preference

2024-04-02 Thread Zack Rusin
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.

Sort the primary plane formats according to our order of preference.
Fixes IGT's kms_atomic plane-invalid-params which assumes that the
preferred format is a 32bpp format.

Signed-off-by: Zack Rusin 
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v4.12+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index bf9931e3a728..bf24f2f0dcfc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -233,10 +233,10 @@ struct vmw_framebuffer_bo {
 
 
 static const uint32_t __maybe_unused vmw_primary_plane_formats[] = {
-   DRM_FORMAT_XRGB1555,
-   DRM_FORMAT_RGB565,
DRM_FORMAT_XRGB,
DRM_FORMAT_ARGB,
+   DRM_FORMAT_RGB565,
+   DRM_FORMAT_XRGB1555,
 };
 
 static const uint32_t __maybe_unused vmw_cursor_plane_formats[] = {
-- 
2.40.1



[PATCH 3/5] drm/vmwgfx: Fix prime import/export

2024-04-02 Thread Zack Rusin
vmwgfx never supported prime import of external buffers. Furthermore the
driver exposes two different objects to userspace: vmw_surface's and
gem buffers but prime import/export only worked with vmw_surfaces.

Because gem buffers are used through the dumb_buffer interface this meant
that the driver created buffers couldn't have been prime exported or
imported.

Fix prime import/export. Makes IGT's kms_prime pass.

Signed-off-by: Zack Rusin 
Fixes: 8afa13a0583f ("drm/vmwgfx: Implement DRIVER_GEM")
Cc:  # v6.6+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c   | 35 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |  7 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |  2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|  1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  3 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c| 32 
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  | 15 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 44 +++---
 8 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
index c52c7bf1485b..717d624e9a05 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -456,8 +456,10 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
.no_wait_gpu = false
};
u32 j, initial_line = dst_offset / dst_stride;
-   struct vmw_bo_blit_line_data d;
+   struct vmw_bo_blit_line_data d = {0};
int ret = 0;
+   struct page **dst_pages = NULL;
+   struct page **src_pages = NULL;
 
/* Buffer objects need to be either pinned or reserved: */
if (!(dst->pin_count))
@@ -477,12 +479,35 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
return ret;
}
 
+   if (!src->ttm->pages && src->ttm->sg) {
+   src_pages = kvmalloc_array(src->ttm->num_pages,
+  sizeof(struct page *), GFP_KERNEL);
+   if (!src_pages)
+   return -ENOMEM;
+   ret = drm_prime_sg_to_page_array(src->ttm->sg, src_pages,
+src->ttm->num_pages);
+   if (ret)
+   goto out;
+   }
+   if (!dst->ttm->pages && dst->ttm->sg) {
+   dst_pages = kvmalloc_array(dst->ttm->num_pages,
+  sizeof(struct page *), GFP_KERNEL);
+   if (!dst_pages) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   ret = drm_prime_sg_to_page_array(dst->ttm->sg, dst_pages,
+dst->ttm->num_pages);
+   if (ret)
+   goto out;
+   }
+
d.mapped_dst = 0;
d.mapped_src = 0;
d.dst_addr = NULL;
d.src_addr = NULL;
-   d.dst_pages = dst->ttm->pages;
-   d.src_pages = src->ttm->pages;
+   d.dst_pages = dst->ttm->pages ? dst->ttm->pages : dst_pages;
+   d.src_pages = src->ttm->pages ? src->ttm->pages : src_pages;
d.dst_num_pages = PFN_UP(dst->resource->size);
d.src_num_pages = PFN_UP(src->resource->size);
d.dst_prot = ttm_io_prot(dst, dst->resource, PAGE_KERNEL);
@@ -504,6 +529,10 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst,
kunmap_atomic(d.src_addr);
if (d.dst_addr)
kunmap_atomic(d.dst_addr);
+   if (src_pages)
+   kvfree(src_pages);
+   if (dst_pages)
+   kvfree(dst_pages);
 
return ret;
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index bfd41ce3c8f4..e5eb21a471a6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -377,7 +377,8 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
 {
struct ttm_operation_ctx ctx = {
.interruptible = params->bo_type != ttm_bo_type_kernel,
-   .no_wait_gpu = false
+   .no_wait_gpu = false,
+   .resv = params->resv,
};
struct ttm_device *bdev = &dev_priv->bdev;
struct drm_device *vdev = &dev_priv->drm;
@@ -394,8 +395,8 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
 
vmw_bo_placement_set(vmw_bo, params->domain, params->busy_domain);
ret = ttm_bo_init_reserved(bdev, &vmw_bo->tbo, params->bo_type,
-  &vmw_bo->placement, 0, &ctx, NULL,
-  NULL, destroy);
+  &vmw_bo->placement, 0, &ctx,
+  params->sg, params->resv, destroy);

[PATCH 2/5] drm/vmwgfx: Implement virtual crc generation

2024-04-02 Thread Zack Rusin
crc checksums are used to validate the output. Normally they're part
of the actual display hardware but on virtual stack there's nothing
to automatically generate them.

Implement crc generation for the vmwgfx stack. This works only on
screen targets, where it's possibly to easily make sure that the
guest side contents of the surface matches the host sides output.

Just like the vblank support, crc generation can only be enabled via:
guestinfo.vmwgfx.vkms_enable = "TRUE"
option in the vmx file.

Makes IGT's kms_pipe_crc_basic pass and allows a huge number of other
IGT tests which require CRC generation of the output to actually run
on vmwgfx. Makes it possible to actually validate a lof of the kms and
drm functionality with vmwgfx.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  |  31 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  22 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 453 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h |  28 +-
 8 files changed, 550 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index e34c48fd25d4..89d3679d2608 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1198,6 +1198,7 @@ static void vmw_driver_unload(struct drm_device *dev)
 
vmw_svga_disable(dev_priv);
 
+   vmw_vkms_cleanup(dev_priv);
vmw_kms_close(dev_priv);
vmw_overlay_close(dev_priv);
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 4f5d7d13c4aa..ddbceaa31b59 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -616,6 +616,7 @@ struct vmw_private {
uint32 *devcaps;
 
bool vkms_enabled;
+   struct workqueue_struct *crc_workq;
 
/*
 * mksGuestStat instance-descriptor and pid arrays
@@ -811,6 +812,7 @@ void vmw_resource_mob_attach(struct vmw_resource *res);
 void vmw_resource_mob_detach(struct vmw_resource *res);
 void vmw_resource_dirty_update(struct vmw_resource *res, pgoff_t start,
   pgoff_t end);
+int vmw_resource_clean(struct vmw_resource *res);
 int vmw_resources_clean(struct vmw_bo *vbo, pgoff_t start,
pgoff_t end, pgoff_t *num_prefault);
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e763cf0e6cfc..e33e5993d8fc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -40,14 +40,14 @@
 
 void vmw_du_init(struct vmw_display_unit *du)
 {
-   hrtimer_init(&du->vkms.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-   du->vkms.timer.function = &vmw_vkms_vblank_simulate;
+   vmw_vkms_crtc_init(&du->crtc);
 }
 
 void vmw_du_cleanup(struct vmw_display_unit *du)
 {
struct vmw_private *dev_priv = vmw_priv(du->primary.dev);
-   hrtimer_cancel(&du->vkms.timer);
+
+   vmw_vkms_crtc_cleanup(&du->crtc);
drm_plane_cleanup(&du->primary);
if (vmw_cmd_supported(dev_priv))
drm_plane_cleanup(&du->cursor.base);
@@ -963,6 +963,7 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
 void vmw_du_crtc_atomic_begin(struct drm_crtc *crtc,
  struct drm_atomic_state *state)
 {
+   vmw_vkms_crtc_atomic_begin(crtc, state);
 }
 
 /**
@@ -2029,6 +2030,29 @@ vmw_kms_create_hotplug_mode_update_property(struct 
vmw_private *dev_priv)
  "hotplug_mode_update", 0, 1);
 }
 
+static void
+vmw_atomic_commit_tail(struct drm_atomic_state *old_state)
+{
+   struct vmw_private *vmw = vmw_priv(old_state->dev);
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *old_crtc_state;
+   int i;
+
+   drm_atomic_helper_commit_tail(old_state);
+
+   if (vmw->vkms_enabled) {
+   for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) {
+   struct vmw_display_unit *du = vmw_crtc_to_du(crtc);
+   (void)old_crtc_state;
+   flush_work(&du->vkms.crc_generator_work);
+   }
+   }
+}
+
+static const struct drm_mode_config_helper_funcs vmw_mode_config_helpers = {
+   .atomic_commit_tail = vmw_atomic_commit_tail,
+};
+
 int vmw_kms_init(struct vmw_private *dev_priv)
 {
struct drm_device *dev = &dev_priv->drm;
@@ -2048,6 +2072,7 @@ int vmw_kms_init(struct vmw_private *dev_priv)
dev->mode_config.max_width = dev_priv->texture_max_width;
dev->mode_config.max_height = dev_priv->texture_max_height;
dev->mod

[PATCH 1/5] drm/vmwgfx: Implement virtual kms

2024-04-02 Thread Zack Rusin
By default vmwgfx doesn't support vblanking or crc generation which
makes it impossible to use various IGT tests to validate vmwgfx.
Implement virtual kernel mode setting, which is mainly related to
simulated vblank support.

Code is very similar to amd's vkms and the vkms module itself, except
that it's integrated with vmwgfx three different output technologies -
legacy, screen object and screen targets.

Make IGT's kms_vblank pass on vmwgfx and allows a lot of other IGT
tests to run with vmwgfx.

Support for vkms needs to be manually enabled by adding:
guestinfo.vmwgfx.vkms_enable = "TRUE"
somewhere in the vmx file, otherwise it's off by default.

Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/Makefile  |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   3 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  |  15 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |   9 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  |  39 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c |  28 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  22 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c | 193 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h |  53 
 10 files changed, 302 insertions(+), 64 deletions(-)
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h

diff --git a/drivers/gpu/drm/vmwgfx/Makefile b/drivers/gpu/drm/vmwgfx/Makefile
index e94479d9cd5b..46a4ab688a7f 100644
--- a/drivers/gpu/drm/vmwgfx/Makefile
+++ b/drivers/gpu/drm/vmwgfx/Makefile
@@ -10,6 +10,6 @@ vmwgfx-y := vmwgfx_execbuf.o vmwgfx_gmr.o vmwgfx_kms.o 
vmwgfx_drv.o \
vmwgfx_simple_resource.o vmwgfx_va.o vmwgfx_blit.o \
vmwgfx_validation.o vmwgfx_page_dirty.o vmwgfx_streamoutput.o \
vmwgfx_devcaps.o ttm_object.o vmwgfx_system_manager.o \
-   vmwgfx_gem.o
+   vmwgfx_gem.o vmwgfx_vkms.o
 
 obj-$(CONFIG_DRM_VMWGFX) := vmwgfx.o
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index c7d90f96d16a..e34c48fd25d4 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -32,6 +32,7 @@
 #include "vmwgfx_binding.h"
 #include "vmwgfx_devcaps.h"
 #include "vmwgfx_mksstat.h"
+#include "vmwgfx_vkms.h"
 #include "ttm_object.h"
 
 #include 
@@ -910,6 +911,8 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
 "Please switch to a supported graphics device to 
avoid problems.");
}
 
+   vmw_vkms_init(dev_priv);
+
ret = vmw_dma_select_mode(dev_priv);
if (unlikely(ret != 0)) {
drm_info(&dev_priv->drm,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 01f41fbb9c3b..4f5d7d13c4aa 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -615,6 +615,8 @@ struct vmw_private {
 
uint32 *devcaps;
 
+   bool vkms_enabled;
+
/*
 * mksGuestStat instance-descriptor and pid arrays
 */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 09214f9339b2..e763cf0e6cfc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -27,6 +27,7 @@
 #include "vmwgfx_kms.h"
 
 #include "vmwgfx_bo.h"
+#include "vmwgfx_vkms.h"
 #include "vmw_surface_cache.h"
 
 #include 
@@ -37,9 +38,16 @@
 #include 
 #include 
 
+void vmw_du_init(struct vmw_display_unit *du)
+{
+   hrtimer_init(&du->vkms.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   du->vkms.timer.function = &vmw_vkms_vblank_simulate;
+}
+
 void vmw_du_cleanup(struct vmw_display_unit *du)
 {
struct vmw_private *dev_priv = vmw_priv(du->primary.dev);
+   hrtimer_cancel(&du->vkms.timer);
drm_plane_cleanup(&du->primary);
if (vmw_cmd_supported(dev_priv))
drm_plane_cleanup(&du->cursor.base);
@@ -957,13 +965,6 @@ void vmw_du_crtc_atomic_begin(struct drm_crtc *crtc,
 {
 }
 
-
-void vmw_du_crtc_atomic_flush(struct drm_crtc *crtc,
- struct drm_atomic_state *state)
-{
-}
-
-
 /**
  * vmw_du_crtc_duplicate_state - duplicate crtc state
  * @crtc: DRM crtc
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index 4a2e3cac1c22..9e83a1553286 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -376,6 +376,12 @@ struct vmw_display_unit {
bool is_implicit;
int set_gui_x;
int set_gui_y;
+
+   struct {
+   struct hrtimer timer;
+   ktime_t period_ns;
+   struct drm_pending_vblank_event *event;
+   } vkms;
 };
 
 #define vmw_crtc_to_du(x) \
@@ -387,6 +393,7 @@ 

[PATCH 0/5] drm/vmwgfx: vblank and crc generation support

2024-04-02 Thread Zack Rusin
vmwgfx didn't have support for vblank or crc generation which made it
impossible to use a large number of IGT tests to properly test DRM
functionality in the driver.

This series add virtual vblank and crc generation support, which allows
running most of IGT and immediately helped fix a number of kms issues
in the driver.

Zack Rusin (5):
  drm/vmwgfx: Implement virtual kms
  drm/vmwgfx: Implement virtual crc generation
  drm/vmwgfx: Fix prime import/export
  drm/vmwgfx: Fix crtc's atomic check conditional
  drm/vmwgfx: Sort primary plane formats by order of preference

 drivers/gpu/drm/vmwgfx/Makefile|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c   |  35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c |   7 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|   5 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|   7 +
 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c|  32 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c|  51 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h|  26 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c|  39 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_prime.c  |  15 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  32 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c   |  28 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c   |  42 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  44 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c   | 630 +
 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h   |  75 +++
 17 files changed, 963 insertions(+), 109 deletions(-)
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.c
 create mode 100644 drivers/gpu/drm/vmwgfx/vmwgfx_vkms.h

-- 
2.40.1



Re: [PATCH] drm/vmwgfx: Filter modes which exceed graphics memory

2024-04-02 Thread Zack Rusin
On Mon, Apr 1, 2024 at 4:35 PM Ian Forbes  wrote:
>
> SVGA requires individual surfaces to fit within graphics memory
> (max_mob_pages) which means that modes with a final buffer size that would
> exceed graphics memory must be pruned otherwise creation will fail.
>
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.
>
> Fixes: d947d1b71deb ("drm/vmwgfx: Add and connect connector helper function")
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> index 3c8414a13dba..49583b186a7d 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
> @@ -830,7 +830,37 @@ static void vmw_stdu_connector_destroy(struct 
> drm_connector *connector)
> vmw_stdu_destroy(vmw_connector_to_stdu(connector));
>  }
>
> +static enum drm_mode_status
> +vmw_stdu_connector_mode_valid(struct drm_connector *connector,
> + struct drm_display_mode *mode)
> +{
> +   enum drm_mode_status ret;
> +   struct drm_device *dev = connector->dev;
> +   struct vmw_private *dev_priv = vmw_priv(dev);
> +   u64 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +   u64 required_mem = mode->hdisplay * assumed_cpp * mode->vdisplay;
> +
> +   ret = drm_mode_validate_size(mode, dev_priv->stdu_max_width,
> +dev_priv->stdu_max_height);
> +   if (ret != MODE_OK)
> +   return ret;
> +
> +   ret = drm_mode_validate_size(mode, dev_priv->texture_max_width,
> +dev_priv->texture_max_height);
> +   if (ret != MODE_OK)
> +   return ret;
>
> +   if (required_mem > dev_priv->max_primary_mem)
> +   return MODE_MEM;
> +
> +   if (required_mem > dev_priv->max_mob_pages * PAGE_SIZE)
> +   return MODE_MEM;
> +
> +   if (required_mem > dev_priv->max_mob_size)
> +   return MODE_MEM;
> +
> +   return MODE_OK;
> +}
>
>  static const struct drm_connector_funcs vmw_stdu_connector_funcs = {
> .dpms = vmw_du_connector_dpms,
> @@ -846,7 +876,7 @@ static const struct drm_connector_funcs 
> vmw_stdu_connector_funcs = {
>  static const struct
>  drm_connector_helper_funcs vmw_stdu_connector_helper_funcs = {
> .get_modes = vmw_connector_get_modes,
> -   .mode_valid = vmw_connector_mode_valid
> +   .mode_valid = vmw_stdu_connector_mode_valid
>  };
>
>
> --
> 2.34.1
>

This looks like a great start. Some improvements that I'd suggest is
to take a look at
bora/vmcore/frobos/test/common/svga/1523068-svga-screen-limits/main.c
where those computations are spelled out a bit more verbose. I'd
suggest following them because those are being tested all the time.
It'd be great if we also covered the multimon case here, but it's not
our main concern.

The second thing that we'd want to adjust is that if we're not using
vmw_connector_mode_valid then we need to remove the stdu paths from
it.

Finally I'd suggest making this a series, i.e. include all the changes
we've talked about like fixing all of the display technologies,
disabling 3d etc iirc we talked about priority list among those at
some time.

z


Re: [PATCH] drm/vmwgfx: Don't memcmp equivalent pointers

2024-03-28 Thread Zack Rusin
On Thu, Mar 28, 2024 at 3:31 PM Ian Forbes  wrote:
>
> These pointers are frequently the same and memcmp does not compare the 
> pointers
> before comparing their contents so this was wasting cycles comparing 16 KiB of
> memory which will always be equal.
>
> Fixes: bb6780aa5a1d9 ("drm/vmwgfx: Diff cursors when using cmds")
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index cd4925346ed4..fbcce84e2f4c 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -216,7 +216,7 @@ static bool vmw_du_cursor_plane_has_changed(struct 
> vmw_plane_state *old_vps,
> new_image = vmw_du_cursor_plane_acquire_image(new_vps);
>
> changed = false;
> -   if (old_image && new_image)
> +   if (old_image && new_image && (old_image != new_image))
> changed = memcmp(old_image, new_image, size) != 0;
>
> return changed;
> --
> 2.34.1
>

The patch looks good but please use "dim checkpatch" and fix the
issues it found. For the "Fixes:" line you also want to use "dim
fixes".

z


Re: [PATCH 01/43] drm/fbdev-generic: Do not set physical framebuffer address

2024-03-17 Thread Zack Rusin
On Tue, Mar 12, 2024 at 11:48 AM Thomas Zimmermann  wrote:
>
> Framebuffer memory is allocated via vmalloc() from non-contiguous
> physical pages. The physical framebuffer start address is therefore
> meaningless. Do not set it.
>
> The value is not used within the kernel and only exported to userspace
> on dedicated ARM configs. No functional change is expected.
>
> Signed-off-by: Thomas Zimmermann 
> Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering")
> Cc: Thomas Zimmermann 
> Cc: Javier Martinez Canillas 
> Cc: Zack Rusin 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc:  # v6.4+
> ---
>  drivers/gpu/drm/drm_fbdev_generic.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
> b/drivers/gpu/drm/drm_fbdev_generic.c
> index d647d89764cb9..b4659cd6285ab 100644
> --- a/drivers/gpu/drm/drm_fbdev_generic.c
> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
> @@ -113,7 +113,6 @@ static int drm_fbdev_generic_helper_fb_probe(struct 
> drm_fb_helper *fb_helper,
> /* screen */
> info->flags |= FBINFO_VIRTFB | FBINFO_READS_FAST;
> info->screen_buffer = screen_buffer;
> -   info->fix.smem_start = 
> page_to_phys(vmalloc_to_page(info->screen_buffer));
> info->fix.smem_len = screen_size;
>
> /* deferred I/O */
> --
> 2.44.0
>

Good idea. I think given that drm_leak_fbdev_smem is off by default we
could remove the setting of smem_start by all of the in-tree drm
drivers (they all have open source userspace that won't mess around
with fbdev fb) - it will be reset to 0 anyway. Actually, I wonder if
we still need drm_leak_fbdev_smem at all...

Reviewed-by: Zack Rusin 

z


Re: [PATCH] vmwgfx: Create debugfs ttm_resource_manager entry only if needed

2024-03-13 Thread Zack Rusin
On Tue, Mar 12, 2024 at 5:36 AM Jocelyn Falempe  wrote:
>
> The driver creates /sys/kernel/debug/dri/0/mob_ttm even when the
> corresponding ttm_resource_manager is not allocated.
> This leads to a crash when trying to read from this file.
>
> Add a check to create mob_ttm, system_mob_ttm, and gmr_ttm debug file
> only when the corresponding ttm_resource_manager is allocated.
>
> crash> bt
> PID: 3133409  TASK: 8fe4834a5000  CPU: 3COMMAND: "grep"
>  #0 [b954506b3b20] machine_kexec at b2a6bec3
>  #1 [b954506b3b78] __crash_kexec at b2bb598a
>  #2 [b954506b3c38] crash_kexec at b2bb68c1
>  #3 [b954506b3c50] oops_end at b2a2a9b1
>  #4 [b954506b3c70] no_context at b2a7e913
>  #5 [b954506b3cc8] __bad_area_nosemaphore at b2a7ec8c
>  #6 [b954506b3d10] do_page_fault at b2a7f887
>  #7 [b954506b3d40] page_fault at b360116e
> [exception RIP: ttm_resource_manager_debug+0x11]
> RIP: c04afd11  RSP: b954506b3df0  RFLAGS: 00010246
> RAX: 8fe41a6d1200  RBX:   RCX: 0940
> RDX:   RSI: c04b4338  RDI: 
> RBP: b954506b3e08   R8: 8fee3ffad000   R9: 
> R10: 8fe41a76a000  R11: 0001  R12: 
> R13: 0001  R14: 8fe5bb6f3900  R15: 8fe41a6d1200
> ORIG_RAX:   CS: 0010  SS: 0018
>  #8 [b954506b3e00] ttm_resource_manager_show at c04afde7 [ttm]
>  #9 [b954506b3e30] seq_read at b2d8f9f3
> RIP: 7f4c4eda8985  RSP: 7ffdbba9e9f8  RFLAGS: 0246
> RAX: ffda  RBX: 0037e000  RCX: 7f4c4eda8985
> RDX: 0037e000  RSI: 7f4c41573000  RDI: 0003
> RBP: 0037e000   R8:    R9: 0037fe30
> R10:   R11: 0246  R12: 7f4c41573000
> R13: 0003  R14: 7f4c41572010  R15: 0003
> ORIG_RAX:   CS: 0033  SS: 002b
>
> Signed-off-by: Jocelyn Falempe 
> Fixes: af4a25bbe5e7 ("drm/vmwgfx: Add debugfs entries for various ttm 
> resource managers")
> Cc: 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index d3e308fdfd5b..c7d90f96d16a 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -1444,12 +1444,15 @@ static void vmw_debugfs_resource_managers_init(struct 
> vmw_private *vmw)
> root, "system_ttm");
> ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, 
> TTM_PL_VRAM),
> root, "vram_ttm");
> -   ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, 
> VMW_PL_GMR),
> -   root, "gmr_ttm");
> -   ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, 
> VMW_PL_MOB),
> -   root, "mob_ttm");
> -   ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, 
> VMW_PL_SYSTEM),
> -   root, "system_mob_ttm");
> +   if (vmw->has_gmr)
> +   
> ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, VMW_PL_GMR),
> +   root, "gmr_ttm");
> +   if (vmw->has_mob) {
> +   
> ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, VMW_PL_MOB),
> +   root, "mob_ttm");
> +   
> ttm_resource_manager_create_debugfs(ttm_manager_type(&vmw->bdev, 
> VMW_PL_SYSTEM),
> +   root, "system_mob_ttm");
> +   }
>  }
>
>  static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val,
>
> base-commit: b33651a5c98dbd5a919219d8c129d0674ef74299
> --
> 2.44.0
>

Thanks! That looks great. I can push it through drm-misc-fixes.

Reviewed-by: Zack Rusin 

z


Re: [PATCH 00/13] drm: Fix reservation locking for pin/unpin and console

2024-02-27 Thread Zack Rusin
/drm_client.h|  10 +++
>  include/drm/drm_gem.h   |   3 +
>  include/drm/drm_gem_shmem_helper.h  |   7 +-
>  21 files changed, 265 insertions(+), 172 deletions(-)
>
>
> base-commit: 7291e2e67dff0ff573900266382c9c9248a7dea5
> prerequisite-patch-id: bdfa0e6341b30cc9d7647172760b3473007c1216
> prerequisite-patch-id: bc27ac702099f481890ae2c7c4a9c531f4a62d64
> prerequisite-patch-id: f5d4bf16dc45334254527c2e31ee21ba4582761c
> prerequisite-patch-id: 734c87e610747779aa41be12eb9e4c984bdfa743
> prerequisite-patch-id: 0aa359f6144c4015c140c8a6750be19099c676fb
> prerequisite-patch-id: c67e5d886a47b7d0266d81100837557fda34cb24
> prerequisite-patch-id: cbc453ee02fae02af22fbfdce56ab732c7a88c36
> --
> 2.43.2
>

That's a really nice cleanup! I already gave a r-b for 9/13. For the rest:
Acked-by: Zack Rusin 

z


Re: [PATCH 09/13] drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()

2024-02-27 Thread Zack Rusin
m_object *obj)
>  {
> struct qxl_bo *bo = gem_to_qxl_bo(obj);
> -   int r;
>
> -   r = qxl_bo_reserve(bo);
> -   if (r)
> -   return r;
> -   r = qxl_bo_pin_locked(bo);
> -   qxl_bo_unreserve(bo);
> -
> -   return r;
> +   return qxl_bo_pin_locked(bo);
>  }
>
>  void qxl_gem_prime_unpin(struct drm_gem_object *obj)
>  {
> struct qxl_bo *bo = gem_to_qxl_bo(obj);
> -   int r;
>
> -   r = qxl_bo_reserve(bo);
> -   if (r)
> -   return;
> qxl_bo_unpin_locked(bo);
> -   qxl_bo_unreserve(bo);
>  }
>
>  struct sg_table *qxl_gem_prime_get_sg_table(struct drm_gem_object *obj)
> diff --git a/drivers/gpu/drm/radeon/radeon_prime.c 
> b/drivers/gpu/drm/radeon/radeon_prime.c
> index b3cfc99f4d7ed..a77881f035e7a 100644
> --- a/drivers/gpu/drm/radeon/radeon_prime.c
> +++ b/drivers/gpu/drm/radeon/radeon_prime.c
> @@ -73,32 +73,21 @@ int radeon_gem_prime_pin(struct drm_gem_object *obj)
> struct radeon_bo *bo = gem_to_radeon_bo(obj);
> int ret = 0;
>
> -   ret = radeon_bo_reserve(bo, false);
> -   if (unlikely(ret != 0))
> -   return ret;
> -
> /* pin buffer into GTT */
> ret = radeon_bo_pin(bo, RADEON_GEM_DOMAIN_GTT, NULL);
> if (likely(ret == 0))
> bo->prime_shared_count++;
>
> -   radeon_bo_unreserve(bo);
> return ret;
>  }
>
>  void radeon_gem_prime_unpin(struct drm_gem_object *obj)
>  {
> struct radeon_bo *bo = gem_to_radeon_bo(obj);
> -   int ret = 0;
> -
> -   ret = radeon_bo_reserve(bo, false);
> -   if (unlikely(ret != 0))
> -   return;
>
> radeon_bo_unpin(bo);
> if (bo->prime_shared_count)
> bo->prime_shared_count--;
> -   radeon_bo_unreserve(bo);
>  }
>
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> index 12787bb9c111d..186150f41fbcc 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> @@ -48,33 +48,20 @@ static void vmw_gem_object_close(struct drm_gem_object 
> *obj,
>  {
>  }
>
> -static int vmw_gem_pin_private(struct drm_gem_object *obj, bool do_pin)
> +static int vmw_gem_object_pin(struct drm_gem_object *obj)
>  {
> -   struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(obj);
> struct vmw_bo *vbo = to_vmw_bo(obj);
> -   int ret;
> -
> -   ret = ttm_bo_reserve(bo, false, false, NULL);
> -   if (unlikely(ret != 0))
> -   goto err;
> -
> -   vmw_bo_pin_reserved(vbo, do_pin);
> -
> -   ttm_bo_unreserve(bo);
> -
> -err:
> -   return ret;
> -}
>
> +   vmw_bo_pin_reserved(vbo, true);
>
> -static int vmw_gem_object_pin(struct drm_gem_object *obj)
> -{
> -   return vmw_gem_pin_private(obj, true);
> +   return 0;
>  }
>
>  static void vmw_gem_object_unpin(struct drm_gem_object *obj)
>  {
> -   vmw_gem_pin_private(obj, false);
> +   struct vmw_bo *vbo = to_vmw_bo(obj);
> +
> +   vmw_bo_pin_reserved(vbo, false);
>  }
>
>  static struct sg_table *vmw_gem_object_get_sg_table(struct drm_gem_object 
> *obj)
> diff --git a/include/drm/drm_gem_shmem_helper.h 
> b/include/drm/drm_gem_shmem_helper.h
> index eb12aa9a8c556..efbc9f27312b5 100644
> --- a/include/drm/drm_gem_shmem_helper.h
> +++ b/include/drm/drm_gem_shmem_helper.h
> @@ -175,15 +175,8 @@ static inline void 
> drm_gem_shmem_object_print_info(struct drm_printer *p, unsign
>  static inline int drm_gem_shmem_object_pin(struct drm_gem_object *obj)
>  {
> struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> -   int ret;
>
> -   ret = dma_resv_lock_interruptible(shmem->base.resv, NULL);
> -   if (ret)
> -   return ret;
> -   ret = drm_gem_shmem_pin_locked(shmem);
> -   dma_resv_unlock(shmem->base.resv);
> -
> -   return ret;
> +   return drm_gem_shmem_pin_locked(shmem);
>  }
>
>  /**
> @@ -197,9 +190,7 @@ static inline void drm_gem_shmem_object_unpin(struct 
> drm_gem_object *obj)
>  {
> struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
>
> -   dma_resv_lock(shmem->base.resv, NULL);
> drm_gem_shmem_unpin_locked(shmem);
> -   dma_resv_unlock(shmem->base.resv);
>  }
>
>

Ah, I see. Looks great.

Reviewed-by: Zack Rusin 


Re: [PATCH 08/13] drm/qxl: Acquire reservation lock in GEM pin/unpin callbacks

2024-02-27 Thread Zack Rusin
On Tue, Feb 27, 2024 at 6:39 AM Thomas Zimmermann  wrote:
>
> Acquire the reservation lock directly in GEM pin callback. Same for
> unpin. Prepares for further changes.
>
> Dma-buf locking semantics require callers to hold the buffer's
> reservation lock when invoking the pin and unpin callbacks. Prepare
> qxl accordingly by pushing locking out of the implementation. A
> follow-up patch will fix locking for all GEM code at once.
>
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/qxl/qxl_prime.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_prime.c b/drivers/gpu/drm/qxl/qxl_prime.c
> index 9169c26357d36..f2646603e12eb 100644
> --- a/drivers/gpu/drm/qxl/qxl_prime.c
> +++ b/drivers/gpu/drm/qxl/qxl_prime.c
> @@ -31,15 +31,27 @@
>  int qxl_gem_prime_pin(struct drm_gem_object *obj)
>  {
> struct qxl_bo *bo = gem_to_qxl_bo(obj);
> +   int r;
>
> -   return qxl_bo_pin(bo);
> +   r = qxl_bo_reserve(bo);
> +   if (r)
> +   return r;
> +   r = qxl_bo_pin_locked(bo);
> +   qxl_bo_unreserve(bo);
> +
> +   return r;
>  }
>
>  void qxl_gem_prime_unpin(struct drm_gem_object *obj)
>  {
> struct qxl_bo *bo = gem_to_qxl_bo(obj);
> +   int r;
>
> -   qxl_bo_unpin(bo);
> +   r = qxl_bo_reserve(bo);
> +   if (r)
> +   return;
> +   qxl_bo_unpin_locked(bo);
> +   qxl_bo_unreserve(bo);
>  }

It looks like gem_prime_pin/unpin is largely the same between a lot of
drivers now. That might be a nice cleanup in the future.

z


Re: [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

2024-02-07 Thread Zack Rusin
On Tue, Feb 6, 2024 at 4:30 PM Ian Forbes  wrote:
>
> So the issue is that SVGA_3D_CMD_DX_PRED_COPY_REGION between 2
> surfaces that are the size of the mode fails. Technically for this to
> work the filter will have to be 1/2 of graphics mem. I was just lucky
> that the next mode in the list was already less than 1/2. 3/4 is not
> actually going to work. Also this only happens on X/Gnome and seems
> more like an issue with the compositor. Wayland/Gnome displays the
> desktop but it's unusable and glitches even with the 1/2 limit. I
> don't think wayland even abides by the mode limits as I see it trying
> to create surfaces larger than the mode. It might be using texture
> limits instead.

So the SVGA_3D_CMD_DX_PRED_COPY_REGION is only available with dx
contexts/3d enabled/gb surfaces. With 3d or gb objects disabled we
should fall back to legacy display and that command shouldn't have
been used. Is that the case? Does it work with 3d/gb objects disabled?

There's a few bugs there:
- SVGA_3D_CMD_DX_PRED_COPY_REGION should only come from userspace, the
userspace should validate that the max amount of resident memory
hasn't been exceeded before issuing those copies
- vmwgfx should be a lot better about determining whether the amount
of resident memory required by the current command buffers hasn't been
exceeded
- In case of high memory pressure vmwgfx should explicitly disable 3d
support. There's no way to run 3d workloads with anything less than
64mb of ram especially given that we do not adjust our texture limits
and they will remain either 4k, 8k or more depending on what we're
running on.

But those are secondary to making resolution switch work correctly on
basic system, i.e.:
1) Disable 3D and gb objects
2) Check if in the kernel log vmwgx says that it's using "legacy display"
3) Check if the resolution switching works correctly
4) If not lets fix that first (fix #1)
5) Disable 3D and keep gb objects active
6) Check that the kernel log select "screen target display unit" and
have 3d disabled (i.e. no SVGA_3D_CMD_DX_PRED_COPY_REGION is coming
through)
7) If that doesn't work lets fix that next (fix #2)
8) Enabled 3d and gb objects (your current default)
9) Check if max_mob_pages (i.e. max_resident_memory) is smaller than
what we'd need to hold even a single a texture limits * 4bpp, print a
warning and disable 3d (this should bring us in line with what we
fixed in point #7) (fix #3)

So basically we want to make sure that on vmwgfx all three
configurations work: 1) 3d and gb objects disabled, 2) 3d disabled, gb
objects enabled, 3) 3d and gb object enabled.

z


Re: [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

2024-02-02 Thread Zack Rusin
On Fri, Feb 2, 2024 at 11:58 AM Ian Forbes  wrote:
>
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.

Sorry, I didn't notice this originally but that's not quite true. svga
doesn't require all mob memory to stay within max_mob_pages (which is
SVGA_REG_GBOBJECT_MEM_SIZE_KB). max_mob_pages is really max resident
memory or suggested-guest-memory-for-best-performance. we can grow
that memory (and we do). I think what's causing problems on systems
with low memory is that cursor mobs and the fb's need to be both
resident but can't. Now SVGA_REG_MAX_PRIMARY_MEM is the max memory in
which our topology needs to fit in (which is max_primary_mem on
vmwgfx) but afaict that's not the issue here and it's checked later in
vmw_kms_validate_mode_vram

> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.

Yes, we should probably rename max_mob_pages to max_resident_memory
instead to make this obvious.

> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.

Does this work if you disable gbobjects? Without gbobject's we won't
have screen targets and thus won't be offsetting by 1/4 so I wonder if
4mb vram with legacy display would work with 1280x800 resolution.

Also, you want to add a "V2" section to your change to describe what
changed in v2 vs v1 (and same for any subsequent change).

>
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index cd4925346ed4..84e1b765cda3 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2858,12 +2858,17 @@ enum drm_mode_status vmw_connector_mode_valid(struct 
> drm_connector *connector,
> struct vmw_private *dev_priv = vmw_priv(dev);
> u32 max_width = dev_priv->texture_max_width;
> u32 max_height = dev_priv->texture_max_height;
> -   u32 assumed_cpp = 4;
> -
> -   if (dev_priv->assume_16bpp)
> -   assumed_cpp = 2;
> +   u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +   u32 pitch = mode->hdisplay * assumed_cpp;
> +   u64 total = mode->vdisplay * pitch;
> +   bool using_stdu = dev_priv->active_display_unit == 
> vmw_du_screen_target;
> +   u64 max_mem_for_st = dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4;
> +   /* ^^^ Max memory for the mode fb when using Screen Target / MOBs.
> +* We need a carveout (1/4) to account for other gfx resources that 
> are
> +* required in gfx mem for an fb update to complete with low gfx mem 
> (<64MiB).
> +*/

Same wording issue as mentioned above and lets use normal comment
style (i.e. comments attach to the code below). max_mem_for_st should
probably be max_mem_for_mode or max_mem_for_mode_st.

> -   if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +   if (using_stdu) {
> max_width  = min(dev_priv->stdu_max_width,  max_width);
> max_height = min(dev_priv->stdu_max_height, max_height);
> }
> @@ -2874,9 +2879,10 @@ enum drm_mode_status vmw_connector_mode_valid(struct 
> drm_connector *connector,
> if (max_height < mode->vdisplay)
> return MODE_BAD_VVALUE;
>
> -   if (!vmw_kms_validate_mode_vram(dev_priv,
> -   mode->hdisplay * assumed_cpp,
> -   mode->vdisplay))
> +   if (using_stdu && (total > max_mem_for_st || total > 
> dev_priv->max_mob_size))
> +   return MODE_MEM;
> +
> +   if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
> return MODE_MEM;

It might make sense to just reuse vmw_kms_validate_mode_vram , it does
what we're claiming to do here and even though it's called
vmw_kms_validate_mode_vram it does actually validate st primary
memory.

z


Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

2024-01-30 Thread Zack Rusin
On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone  wrote:
>
> Hi,
>
> On Tue, 30 Jan 2024 at 18:39, Zack Rusin  wrote:
> > In general, yes. Of course it's a little more convoluted because we'll
> > act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> > will fake page-flips because the only memory we'll have is a single
> > buffer as the actual page-flipping happens in the presentation code on
> > the host. So the guest is not aware of the actual presentation (it's
> > also why we don't have any sort of vblank signaling in vmwgfx, the
> > concept just doesn't exist for us). i.e. on para-virtualized drivers
> > the actual page-flips will be property of the presentation code that's
> > outside of the guest. It's definitely one those things that I wanted
> > to have a good solution for in a while, in particular to have a better
> > story behind vblank handling, but it's difficult because
> > "presentation" on vm's is in general difficult to define - it might be
> > some vnc connected host on the other continent. Having said that
> > that's basically a wonky VRR display so we should be able to handle
> > our presentation as VRR and give more control of updates to the guest,
> > but we haven't done it yet.
>
> Please don't.
>
> Photon time is _a_ useful metric, but only backwards-informational.
> It's nice to give userspace a good forward estimate of when pixels
> will hit retinas, but as it's not fully reliable, the main part is
> being able to let it know when it did happen so it can adjust. Given
> that it's not reliable, we can't use it as a basis for preparing
> submissions though, so we don't, even on bare-metal drivers.
>
> As you've noted though, it really falls apart on non-bare-metal cases,
> especially where latency vastly exceeds throughput, or when either is
> hugely variable. So we don't ever use it as a basis.
>
> VRR is worse though. The FRR model is 'you can display new content
> every $period, and here's your basis so you can calibrate phase'. The
> VRR model is 'you can display new content so rapidly it's not worth
> trying to quantise, just fire it as rapidly as possible'. That's a
> world away from 'e ... might be 16ms, might be 500? dunno really'.
>
> The entire model we have is that basis timing flows backwards. The
> 'hardware' gives us a deadline, KMS angles to meet that with a small
> margin, the compositor angles to meet that with a margin again, and it
> lines up client repaints to hit that window too. Everything works on
> that model, so it's not super surprising that using svga is - to quote
> one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

That's very hurtful. Or it would be but of course you didn't believe
them because they're working on Weston so clearly don't make good
choices in general, right? The presentation on esxi is just as smooth
as it is by default on Ubuntu on new hardware...

> Given that the entire ecosystem is based on this model, I don't think
> there's an easy way out where svga just does something wildly
> different. The best way to fix it is to probably work on predictable
> quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> your current throughput, with something similar to hotplug/LINK_STATUS
> and faked EDID to let userspace know when the period changes. If you
> have variability within the cycle, e.g. dropped frames, then just suck
> it up and keep the illusion alive to userspace that it's presenting to
> a fixed period, and if/when you calculate there's a better
> quantisation then let userspace know what it is so it can adjust.
>
> But there's really no future in just doing random presentation rates,
> because that's not the API anyone has written for.

See, my hope was that with vrr we could layer the weird remote
presentation semantics of virtualized guest on top of the same
infrastructure that would be used on real hardware. If you're saying
that it's not the way userspace will work, then yea, that doesn't
help. My issue, that's general for para-virtualized drivers, is that
any behavior that differs from hw drivers means that it's going to
break at some point, we see that even for basic things like the
update-layout hotplug events that have been largely standardized for
many years. I'm assuming that refresh-rate-changed will result in the
same regressions, but fwiw if I can implement FRR correctly and punt
any issues that arise due to changes in the FRR as issues in userspace
then that does make my life a lot easier, so I'm not going to object
to that.

z


Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

2024-01-30 Thread Zack Rusin
On Fri, Jan 12, 2024 at 4:20 PM Ian Forbes  wrote:
>
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.
>
> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.
>
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.
>
> Signed-off-by: Ian Forbes 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 28ff30e32fab..39d6d17fc488 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct 
> drm_connector *connector,
> struct vmw_private *dev_priv = vmw_priv(dev);
> u32 max_width = dev_priv->texture_max_width;
> u32 max_height = dev_priv->texture_max_height;
> -   u32 assumed_cpp = 4;
> +   u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +   u32 pitch = mode->hdisplay * assumed_cpp;
> +   u64 total = mode->vdisplay * pitch;
> +   bool using_stdu = dev_priv->active_display_unit == 
> vmw_du_screen_target;
>
> -   if (dev_priv->assume_16bpp)
> -   assumed_cpp = 2;
> -
> -   if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +   if (using_stdu) {
> max_width  = min(dev_priv->stdu_max_width,  max_width);
> max_height = min(dev_priv->stdu_max_height, max_height);
> }
> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct 
> drm_connector *connector,
> if (max_height < mode->vdisplay)
> return MODE_BAD_VVALUE;
>
> -   if (!vmw_kms_validate_mode_vram(dev_priv,
> -   mode->hdisplay * assumed_cpp,
> -   mode->vdisplay))
> +   if (using_stdu &&
> +   (total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||

Could you export that computation somewhere where we could document
why we're doing this? Just to not leave the awkward "* 3 /4" that
everyone reading this code will wonder about?
And also make sure you indent this correctly, "dim checkpatch" should
warn about this.

z


Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

2024-01-30 Thread Zack Rusin
On Fri, Jan 19, 2024 at 4:22 AM Thomas Zimmermann  wrote:
>
> Hi
>
> Am 18.01.24 um 19:25 schrieb Zack Rusin:
> > On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann  
> > wrote:
> >>
> >> Hi
> >>
> >> Am 12.01.24 um 21:38 schrieb Ian Forbes:
> >>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> >>> means that modes with a final buffer size that would exceed graphics 
> >>> memory
> >>> must be pruned otherwise creation will fail.
> >>>
> >>> Additionally, device commands which use multiple graphics resources must
> >>> have all their resources fit within graphics memory for the duration of 
> >>> the
> >>> command. Thus we need a small carve out of 1/4 of graphics memory to 
> >>> ensure
> >>> commands likes surface copies to the primary framebuffer for cursor
> >>> composition or damage clips can fit within graphics memory.
> >>>
> >>> This fixes an issue where VMs with low graphics memory (< 64MiB) 
> >>> configured
> >>> with high resolution mode boot to a black screen because surface creation
> >>> fails.
> >>
> >> That is a long-standing problem, which we have observed with other
> >> drivers as well. On low-memory devices, TTM doesn't play well. The real
> >> fix would be to export all modes that possibly fit and sort out the
> >> invalid configurations in atomic_check. It's just a lot more work.
> >>
> >> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
> >> VRAM?
> >
> > Unfortunately we can't do that because on new esx servers without
> > gpu's the default is 16MB. A lot of people are still running their esx
> > boxes with 4MB, which is in general the most common problem because
> > with 4MB people still tend to like to set 1280x800 which with 32bpp fb
> > takes 4096000 bytes and with 4MB available that leaves only 96KB
> > available and we need more to also allocate things like the cursor.
> > Even if ttm did everything right technically 1280x800 @ 32bpp
> > resolution will fit in a 4MB graphics memory, but then the system will
> > not be able to have a hardware (well, virtualized) cursor. It's
> > extremely unlikely people would even be aware of this tradeoff when
> > making the decision to increase resolution.
>
> Do you allocate buffer storage directly in the provided VRAM? If so how
> do you do page flips then? You'd need for the example of 1280x800-32,
> you'd need around 8 MiB to keep front and back buffer in VRAM. I guess,
> you only support the framebuffer console (which doesn't do pageflips)?

In general, yes. Of course it's a little more convoluted because we'll
act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
will fake page-flips because the only memory we'll have is a single
buffer as the actual page-flipping happens in the presentation code on
the host. So the guest is not aware of the actual presentation (it's
also why we don't have any sort of vblank signaling in vmwgfx, the
concept just doesn't exist for us). i.e. on para-virtualized drivers
the actual page-flips will be property of the presentation code that's
outside of the guest. It's definitely one those things that I wanted
to have a good solution for in a while, in particular to have a better
story behind vblank handling, but it's difficult because
"presentation" on vm's is in general difficult to define - it might be
some vnc connected host on the other continent. Having said that
that's basically a wonky VRR display so we should be able to handle
our presentation as VRR and give more control of updates to the guest,
but we haven't done it yet.

> In mgag200 and ast, I had the luxury for replacing TTM with SHMEM
> helpers, which worked around the problem easily. Maybe that's an option
> for low-memory systems?

Our current device doesn't have the ability to present out of
unspecified memory in the guest, i.e. the host, which is doing the
presentation, is not aware of how guest kernel lays out the memory so
we need to basically create a page-table for every graphics object
(VMW_PL_MOB placement in vmwgfx) so that the host can actually find
the memory it needs to read. So the shmem helpers would need something
extra for us to be able to generate those page tables for the
drm_gem_object's it deals with.

z


[PATCH 1/5] drm/vmwgfx: Refactor drm connector probing for display modes

2024-01-26 Thread Zack Rusin
From: Martin Krastev 

Implement drm_connector_helper_funcs.mode_valid and .get_modes,
replacing custom drm_connector_funcs.fill_modes code with
drm_helper_probe_single_connector_modes; for STDU, LDU & SOU
display units.

Signed-off-by: Martin Krastev 
Reviewed-by: Zack Rusin 
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  | 272 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |   6 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  |   5 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c |   5 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |   4 +-
 5 files changed, 101 insertions(+), 191 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 8589a1c3cc36..2398041502c9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void vmw_du_cleanup(struct vmw_display_unit *du)
 {
@@ -2282,107 +2283,6 @@ vmw_du_connector_detect(struct drm_connector 
*connector, bool force)
connector_status_connected : connector_status_disconnected);
 }
 
-static struct drm_display_mode vmw_kms_connector_builtin[] = {
-   /* 640x480@60Hz */
-   { DRM_MODE("640x480", DRM_MODE_TYPE_DRIVER, 25175, 640, 656,
-  752, 800, 0, 480, 489, 492, 525, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC) },
-   /* 800x600@60Hz */
-   { DRM_MODE("800x600", DRM_MODE_TYPE_DRIVER, 4, 800, 840,
-  968, 1056, 0, 600, 601, 605, 628, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1024x768@60Hz */
-   { DRM_MODE("1024x768", DRM_MODE_TYPE_DRIVER, 65000, 1024, 1048,
-  1184, 1344, 0, 768, 771, 777, 806, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC) },
-   /* 1152x864@75Hz */
-   { DRM_MODE("1152x864", DRM_MODE_TYPE_DRIVER, 108000, 1152, 1216,
-  1344, 1600, 0, 864, 865, 868, 900, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1280x720@60Hz */
-   { DRM_MODE("1280x720", DRM_MODE_TYPE_DRIVER, 74500, 1280, 1344,
-  1472, 1664, 0, 720, 723, 728, 748, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1280x768@60Hz */
-   { DRM_MODE("1280x768", DRM_MODE_TYPE_DRIVER, 79500, 1280, 1344,
-  1472, 1664, 0, 768, 771, 778, 798, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1280x800@60Hz */
-   { DRM_MODE("1280x800", DRM_MODE_TYPE_DRIVER, 83500, 1280, 1352,
-  1480, 1680, 0, 800, 803, 809, 831, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
-   /* 1280x960@60Hz */
-   { DRM_MODE("1280x960", DRM_MODE_TYPE_DRIVER, 108000, 1280, 1376,
-  1488, 1800, 0, 960, 961, 964, 1000, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1280x1024@60Hz */
-   { DRM_MODE("1280x1024", DRM_MODE_TYPE_DRIVER, 108000, 1280, 1328,
-  1440, 1688, 0, 1024, 1025, 1028, 1066, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1360x768@60Hz */
-   { DRM_MODE("1360x768", DRM_MODE_TYPE_DRIVER, 85500, 1360, 1424,
-  1536, 1792, 0, 768, 771, 777, 795, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1440x1050@60Hz */
-   { DRM_MODE("1400x1050", DRM_MODE_TYPE_DRIVER, 121750, 1400, 1488,
-  1632, 1864, 0, 1050, 1053, 1057, 1089, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1440x900@60Hz */
-   { DRM_MODE("1440x900", DRM_MODE_TYPE_DRIVER, 106500, 1440, 1520,
-  1672, 1904, 0, 900, 903, 909, 934, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1600x1200@60Hz */
-   { DRM_MODE("1600x1200", DRM_MODE_TYPE_DRIVER, 162000, 1600, 1664,
-  1856, 2160, 0, 1200, 1201, 1204, 1250, 0,
-  DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1680x1050@60Hz */
-   { DRM_MODE("1680x1050", DRM_MODE_TYPE_DRIVER, 146250, 1680, 1784,
-  1960, 2240, 0, 1050, 1053, 1059, 1089, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1792x1344@60Hz */
-   { DRM_MODE("1792x1344", DRM_MODE_TYPE_DRIVER, 204750, 1792, 1920,
-  2120, 2448, 0, 1344, 1345, 1348, 1394, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) },
-   /* 1853x1392@60Hz */
-   { DRM_MODE("1856x1392", DRM_MODE_TYPE_DRIVER, 218250, 1856, 1952,
-  2176, 2528, 0, 1392, 1393, 1396, 1439, 0,
-  DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC) }

[PATCH 5/5] drm/vmwgfx: Fix the lifetime of the bo cursor memory

2024-01-26 Thread Zack Rusin
The cleanup can be dispatched while the atomic update is still active,
which means that the memory acquired in the atomic update needs to
not be invalidated by the cleanup. The buffer objects in vmw_plane_state
instead of using the builtin map_and_cache were trying to handle
the lifetime of the mapped memory themselves, leading to crashes.

Use the map_and_cache instead of trying to manage the lifetime of the
buffer objects held by the vmw_plane_state.

Fixes kernel oops'es in IGT's kms_cursor_legacy forked-bo.

Signed-off-by: Zack Rusin 
Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds")
Cc:  # v6.2+
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e2bfaf4522a6..cd4925346ed4 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -185,13 +185,12 @@ static u32 vmw_du_cursor_mob_size(u32 w, u32 h)
  */
 static u32 *vmw_du_cursor_plane_acquire_image(struct vmw_plane_state *vps)
 {
-   bool is_iomem;
if (vps->surf) {
if (vps->surf_mapped)
return 
vmw_bo_map_and_cache(vps->surf->res.guest_memory_bo);
return vps->surf->snooper.image;
} else if (vps->bo)
-   return ttm_kmap_obj_virtual(&vps->bo->map, &is_iomem);
+   return vmw_bo_map_and_cache(vps->bo);
return NULL;
 }
 
@@ -653,22 +652,12 @@ vmw_du_cursor_plane_cleanup_fb(struct drm_plane *plane,
 {
struct vmw_cursor_plane *vcp = vmw_plane_to_vcp(plane);
struct vmw_plane_state *vps = vmw_plane_state_to_vps(old_state);
-   bool is_iomem;
 
if (vps->surf_mapped) {
vmw_bo_unmap(vps->surf->res.guest_memory_bo);
vps->surf_mapped = false;
}
 
-   if (vps->bo && ttm_kmap_obj_virtual(&vps->bo->map, &is_iomem)) {
-   const int ret = ttm_bo_reserve(&vps->bo->tbo, true, false, 
NULL);
-
-   if (likely(ret == 0)) {
-   ttm_bo_kunmap(&vps->bo->map);
-   ttm_bo_unreserve(&vps->bo->tbo);
-   }
-   }
-
vmw_du_cursor_plane_unmap_cm(vps);
vmw_du_put_cursor_mob(vcp, vps);
 
-- 
2.40.1



[PATCH 2/5] drm/vmwgfx: Make all surfaces shareable

2024-01-26 Thread Zack Rusin
From: Maaz Mombasawala 

There is no real need to have a separate pool for shareable and
non-shareable surfaces. Make all surfaces shareable, regardless of whether
the drm_vmw_surface_flag_shareable has been specified.

Signed-off-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/ttm_object.c |  6 +++---
 drivers/gpu/drm/vmwgfx/ttm_object.h |  3 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 17 ++---
 include/uapi/drm/vmwgfx_drm.h   |  5 +++--
 4 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/ttm_object.c 
b/drivers/gpu/drm/vmwgfx/ttm_object.c
index ddf8373c1d77..6806c05e57f6 100644
--- a/drivers/gpu/drm/vmwgfx/ttm_object.c
+++ b/drivers/gpu/drm/vmwgfx/ttm_object.c
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
  *
- * Copyright (c) 2009-2022 VMware, Inc., Palo Alto, CA., USA
+ * Copyright (c) 2009-2023 VMware, Inc., Palo Alto, CA., USA
  * All Rights Reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
@@ -648,7 +648,6 @@ int ttm_prime_handle_to_fd(struct ttm_object_file *tfile,
  * @tfile: struct ttm_object_file identifying the caller
  * @size: The size of the dma_bufs we export.
  * @prime: The object to be initialized.
- * @shareable: See ttm_base_object_init
  * @type: See ttm_base_object_init
  * @refcount_release: See ttm_base_object_init
  *
@@ -656,10 +655,11 @@ int ttm_prime_handle_to_fd(struct ttm_object_file *tfile,
  * for data sharing between processes and devices.
  */
 int ttm_prime_object_init(struct ttm_object_file *tfile, size_t size,
- struct ttm_prime_object *prime, bool shareable,
+ struct ttm_prime_object *prime,
  enum ttm_object_type type,
  void (*refcount_release) (struct ttm_base_object **))
 {
+   bool shareable = !!(type == VMW_RES_SURFACE);
mutex_init(&prime->mutex);
prime->size = PAGE_ALIGN(size);
prime->real_type = type;
diff --git a/drivers/gpu/drm/vmwgfx/ttm_object.h 
b/drivers/gpu/drm/vmwgfx/ttm_object.h
index e6b77ee33e55..573e038c0fab 100644
--- a/drivers/gpu/drm/vmwgfx/ttm_object.h
+++ b/drivers/gpu/drm/vmwgfx/ttm_object.h
@@ -1,6 +1,6 @@
 /**
  *
- * Copyright (c) 2006-2022 VMware, Inc., Palo Alto, CA., USA
+ * Copyright (c) 2006-2023 VMware, Inc., Palo Alto, CA., USA
  * All Rights Reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
@@ -288,7 +288,6 @@ extern void ttm_object_device_release(struct 
ttm_object_device **p_tdev);
 extern int ttm_prime_object_init(struct ttm_object_file *tfile,
 size_t size,
 struct ttm_prime_object *prime,
-bool shareable,
 enum ttm_object_type type,
 void (*refcount_release)
 (struct ttm_base_object **));
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 10498725034c..e7a744dfcecf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -832,8 +832,6 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void 
*data,
srf->snooper.image = NULL;
}
 
-   user_srf->prime.base.shareable = false;
-   user_srf->prime.base.tfile = NULL;
if (drm_is_primary_client(file_priv))
user_srf->master = drm_file_get_master(file_priv);
 
@@ -847,10 +845,10 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void 
*data,
goto out_unlock;
 
/*
-* A gb-aware client referencing a shared surface will
-* expect a backup buffer to be present.
+* A gb-aware client referencing a surface will expect a backup
+* buffer to be present.
 */
-   if (dev_priv->has_mob && req->shareable) {
+   if (dev_priv->has_mob) {
struct vmw_bo_params params = {
.domain = VMW_BO_DOMAIN_SYS,
.busy_domain = VMW_BO_DOMAIN_SYS,
@@ -869,8 +867,9 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void 
*data,
}
 
tmp = vmw_resource_reference(&srf->res);
-   ret = ttm_prime_object_init(tfile, res->guest_memory_size, 
&user_srf->prime,
-   req->shareable, VMW_RES_SURFACE,
+   ret = ttm_prime_object_init(tfile, res->guest_memory_size,
+   &user_srf->prime,
+   VMW_RES_SURFACE,
&vmw_user_s

[PATCH 3/5] drm/vmwgfx: Add SPDX header to vmwgfx_drm.h

2024-01-26 Thread Zack Rusin
From: Maaz Mombasawala 

Update vmwgfx_drm.h with SPDX-License-Identifier:
(GPL-2.0 WITH Linux-syscall-note) OR MIT

Signed-off-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
Signed-off-by: Zack Rusin 
---
 include/uapi/drm/vmwgfx_drm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/drm/vmwgfx_drm.h b/include/uapi/drm/vmwgfx_drm.h
index 26d96fecb902..7d786a0cc835 100644
--- a/include/uapi/drm/vmwgfx_drm.h
+++ b/include/uapi/drm/vmwgfx_drm.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
 /**
  *
  * Copyright © 2009-2023 VMware, Inc., Palo Alto, CA., USA
-- 
2.40.1



[PATCH 4/5] drm/vmwgfx: Fix vmw_du_get_cursor_mob fencing of newly-created MOBs

2024-01-26 Thread Zack Rusin
From: Martin Krastev 

The fencing of MOB creation used in vmw_du_get_cursor_mob was incompatible
with register-based device communication employed by this routine. As a
result cursor MOB creation was racy, leading to potentially broken/missing
mouse cursor on desktops using CursorMob device feature.

Fixes: 53bc3f6fb6b3 ("drm/vmwgfx: Clean up cursor mobs")
Signed-off-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Zack Rusin 
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 2398041502c9..e2bfaf4522a6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -273,6 +273,7 @@ static int vmw_du_get_cursor_mob(struct vmw_cursor_plane 
*vcp,
u32 size = vmw_du_cursor_mob_size(vps->base.crtc_w, vps->base.crtc_h);
u32 i;
u32 cursor_max_dim, mob_max_size;
+   struct vmw_fence_obj *fence = NULL;
int ret;
 
if (!dev_priv->has_mob ||
@@ -314,7 +315,15 @@ static int vmw_du_get_cursor_mob(struct vmw_cursor_plane 
*vcp,
if (ret != 0)
goto teardown;
 
-   vmw_bo_fence_single(&vps->cursor.bo->tbo, NULL);
+   ret = vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);
+   if (ret != 0) {
+   ttm_bo_unreserve(&vps->cursor.bo->tbo);
+   goto teardown;
+   }
+
+   dma_fence_wait(&fence->base, false);
+   dma_fence_put(&fence->base);
+
ttm_bo_unreserve(&vps->cursor.bo->tbo);
return 0;
 
-- 
2.40.1



  1   2   3   4   5   6   >