from:"Chia\-I Wu"

On Wed, Apr 26, 2023 at 4:05 AM Christian König
 wrote:
>
> Am 26.04.23 um 08:17 schrieb Chia-I Wu:
> > mgr->ctx_handles should be protected by mgr->lock.
> >
> > v2: improve commit message
> >
> > Signed-off-by: Chia-I Wu 
> > Cc: sta...@vger.kernel.org
>
> Please don't manually CC sta...@vger.kernel.org while sending patches
> out, let us maintainers push that upstream with the appropriate tag and
> Greg picking it up from there.
>
> A Fixes tag and figuring out to which stable versions this needs to be
> backported are nice to have as well, but Alex and I can take care of
> that as well.
>
> Apart from that the technical side of the patch is Reviewed-by:
> Christian König .
All done.  Thanks for clarifying the process and sorry for getting it
wrong in the first place :(
>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > index e9b45089a28a6..863b2a34b2d64 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > @@ -38,6 +38,7 @@ static int amdgpu_sched_process_priority_override(struct 
> > amdgpu_device *adev,
> >   {
> >   struct fd f = fdget(fd);
> >   struct amdgpu_fpriv *fpriv;
> > + struct amdgpu_ctx_mgr *mgr;
> >   struct amdgpu_ctx *ctx;
> >   uint32_t id;
> >   int r;
> > @@ -51,8 +52,11 @@ static int amdgpu_sched_process_priority_override(struct 
> > amdgpu_device *adev,
> >   return r;
> >   }
> >
> > - idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
> > + mgr = >ctx_mgr;
> > + mutex_lock(>lock);
> > + idr_for_each_entry(>ctx_handles, ctx, id)
> >   amdgpu_ctx_priority_override(ctx, priority);
> > + mutex_unlock(>lock);
> >
> >   fdput(f);
> >   return 0;
>

[PATCH v3] drm/amdgpu: add a missing lock for AMDGPU_SCHED

mgr->ctx_handles should be protected by mgr->lock.

v2: improve commit message
v3: add a Fixes tag

Signed-off-by: Chia-I Wu 
Reviewed-by: Christian König 
Fixes: 52c6a62c64fac ("drm/amdgpu: add interface for editing a foreign 
process's priority v3")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index e9b45089a28a6..863b2a34b2d64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -38,6 +38,7 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
 {
struct fd f = fdget(fd);
struct amdgpu_fpriv *fpriv;
+   struct amdgpu_ctx_mgr *mgr;
struct amdgpu_ctx *ctx;
uint32_t id;
int r;
@@ -51,8 +52,11 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
return r;
}
 
-   idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   idr_for_each_entry(>ctx_handles, ctx, id)
amdgpu_ctx_priority_override(ctx, priority);
+   mutex_unlock(>lock);
 
fdput(f);
return 0;
-- 
2.40.1.495.gc816e09b53d-goog

Re: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

On Tue, Apr 25, 2023 at 9:58 PM Greg KH  wrote:
>
> On Tue, Apr 25, 2023 at 05:48:27PM -0700, Chia-I Wu wrote:
> > Signed-off-by: Chia-I Wu 
> > Cc: sta...@vger.kernel.org
>
> I know I can not take patches without any changelog text at all, maybe
> the DRM developers are more lax, but it's not a good idea at all.
Oops, sorry.  I've sent out v2 to address both review feedback.
>
> thanks,
>
> greg k-h

[PATCH v2] drm/amdgpu: add a missing lock for AMDGPU_SCHED

mgr->ctx_handles should be protected by mgr->lock.

v2: improve commit message

Signed-off-by: Chia-I Wu 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index e9b45089a28a6..863b2a34b2d64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -38,6 +38,7 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
 {
struct fd f = fdget(fd);
struct amdgpu_fpriv *fpriv;
+   struct amdgpu_ctx_mgr *mgr;
struct amdgpu_ctx *ctx;
uint32_t id;
int r;
@@ -51,8 +52,11 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
return r;
}
 
-   idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   idr_for_each_entry(>ctx_handles, ctx, id)
amdgpu_ctx_priority_override(ctx, priority);
+   mutex_unlock(>lock);
 
fdput(f);
return 0;
-- 
2.40.1.495.gc816e09b53d-goog

Re: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-25 Thread Chia-I Wu

On Tue, Apr 25, 2023 at 7:27 PM Chen, Guchun  wrote:
>
> From coding style's perspective, this lock/unlock handling should be put into 
> amdgpu_ctx_priority_override.
The locking is to protect mgr->ctx_handles.
>
> Regards,
> Guchun
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of Chia-
> > I Wu
> > Sent: Wednesday, April 26, 2023 8:48 AM
> > To: dri-devel@lists.freedesktop.org
> > Cc: Pan, Xinhui ; linux-ker...@vger.kernel.org;
> > sta...@vger.kernel.org; amd-...@lists.freedesktop.org; Daniel Vetter
> > ; Deucher, Alexander ;
> > David Airlie ; Koenig, Christian
> > 
> > Subject: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED
> >
> > Signed-off-by: Chia-I Wu 
> > Cc: sta...@vger.kernel.org
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > index e9b45089a28a6..863b2a34b2d64 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > @@ -38,6 +38,7 @@ static int
> > amdgpu_sched_process_priority_override(struct amdgpu_device *adev,  {
> >   struct fd f = fdget(fd);
> >   struct amdgpu_fpriv *fpriv;
> > + struct amdgpu_ctx_mgr *mgr;
> >   struct amdgpu_ctx *ctx;
> >   uint32_t id;
> >   int r;
> > @@ -51,8 +52,11 @@ static int
> > amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
> >   return r;
> >   }
> >
> > - idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
> > + mgr = >ctx_mgr;
> > + mutex_lock(>lock);
> > + idr_for_each_entry(>ctx_handles, ctx, id)
> >   amdgpu_ctx_priority_override(ctx, priority);
> > + mutex_unlock(>lock);
> >
> >   fdput(f);
> >   return 0;
> > --
> > 2.40.0.634.g4ca3ef3211-goog
>

[PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-25 Thread Chia-I Wu

Signed-off-by: Chia-I Wu 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index e9b45089a28a6..863b2a34b2d64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -38,6 +38,7 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
 {
struct fd f = fdget(fd);
struct amdgpu_fpriv *fpriv;
+   struct amdgpu_ctx_mgr *mgr;
struct amdgpu_ctx *ctx;
uint32_t id;
int r;
@@ -51,8 +52,11 @@ static int amdgpu_sched_process_priority_override(struct 
amdgpu_device *adev,
return r;
}
 
-   idr_for_each_entry(>ctx_mgr.ctx_handles, ctx, id)
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   idr_for_each_entry(>ctx_handles, ctx, id)
amdgpu_ctx_priority_override(ctx, priority);
+   mutex_unlock(>lock);
 
fdput(f);
return 0;
-- 
2.40.0.634.g4ca3ef3211-goog

[PATCH] drm/amdkfd: fix potential kgd_mem UAFs

2023-03-08 Thread Chia-I Wu

kgd_mem should be accessed with p->mutex locked, or it could have been
freed by kfd_ioctl_free_memory_of_gpu.

Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6d291aa6386bd..3c630114210d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1293,14 +1293,14 @@ static int kfd_ioctl_map_memory_to_gpu(struct file 
*filep,
args->n_success = i+1;
}
 
-   mutex_unlock(>mutex);
-
err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) 
mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
 
+   mutex_unlock(>mutex);
+
/* Flush TLBs after waiting for the page table updates to complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
@@ -1316,9 +1316,9 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 map_memory_to_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
 
return err;
@@ -1332,6 +1332,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
void *mem;
long err = 0;
uint32_t *devices_arr = NULL, i;
+   bool flush_tlb;
 
if (!args->n_devices) {
pr_debug("Device IDs array empty\n");
@@ -1384,16 +1385,19 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
}
args->n_success = i+1;
}
-   mutex_unlock(>mutex);
 
-   if (kfd_flush_tlb_after_unmap(pdd->dev)) {
+   flush_tlb = kfd_flush_tlb_after_unmap(pdd->dev);
+   if (flush_tlb) {
err = amdgpu_amdkfd_gpuvm_sync_memory(pdd->dev->adev,
(struct kgd_mem *) mem, true);
if (err) {
pr_debug("Sync memory failed, wait interrupted by user 
signal\n");
goto sync_memory_failed;
}
+   }
+   mutex_unlock(>mutex);
 
+   if (flush_tlb) {
/* Flush TLBs after waiting for the page table updates to 
complete */
for (i = 0; i < args->n_devices; i++) {
peer_pdd = kfd_process_device_data_by_id(p, 
devices_arr[i]);
@@ -1409,9 +1413,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file 
*filep,
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 unmap_memory_from_gpu_failed:
+sync_memory_failed:
mutex_unlock(>mutex);
 copy_from_user_failed:
-sync_memory_failed:
kfree(devices_arr);
return err;
 }
-- 
2.40.0.rc1.284.g88254d51c5-goog

[PATCH] drm/amdkfd: fix a potential double free in pqm_create_queue

2023-03-07 Thread Chia-I Wu

Set *q to NULL on errors, otherwise pqm_create_queue would free it
again.

Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 5137476ec18e6..4236539d9f932 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -218,8 +218,8 @@ static int init_user_queue(struct process_queue_manager 
*pqm,
return 0;
 
 cleanup:
-   if (dev->shared_resources.enable_mes)
-   uninit_queue(*q);
+   uninit_queue(*q);
+   *q = NULL;
return retval;
 }
 
-- 
2.40.0.rc0.216.gc4246ad0f0-goog

Re: [PATCH v2 0/3] drm/msm/gpu: Devfreq fixes+tuning

2023-01-13 Thread Chia-I Wu

Series is

Reviewed-by: Chia-I Wu 


On Tue, Jan 10, 2023 at 3:14 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Rob Clark (3):
>   drm/msm/gpu: Add devfreq tuning debugfs
>   drm/msm/gpu: Bypass PM QoS constraint for idle clamp
>   drm/msm/gpu: Add default devfreq thresholds
>
>  drivers/gpu/drm/msm/Kconfig   |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |   2 +-
>  drivers/gpu/drm/msm/msm_debugfs.c |  12 +++
>  drivers/gpu/drm/msm/msm_drv.h |   9 ++
>  drivers/gpu/drm/msm/msm_gpu.h |  15 ++-
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 148 --
>  6 files changed, 100 insertions(+), 87 deletions(-)
>
> --
> 2.38.1
>

Re: [PATCH] drm/panfrost: Fix GEM handle creation UAF

2022-12-16 Thread Chia-I Wu

On Fri, Dec 16, 2022 at 4:20 PM Rob Clark  wrote:
>
> On Fri, Dec 16, 2022 at 3:59 PM Chia-I Wu  wrote:
> >
> > On Fri, Dec 16, 2022 at 3:34 PM Rob Clark  wrote:
> > >
> > > From: Rob Clark 
> > >
> > > Relying on an unreturned handle to hold a reference to an object we
> > > dereference is not safe.  Userspace can guess the handle and race us
> > > by closing the handle from another thread.  The _create_with_handle()
> > > that returns an object ptr is pretty much a pattern to avoid.  And
> > > ideally creating the handle would be done after any needed dererencing.
> > > But in this case creation of the mapping is tied to the handle creation.
> > > Fortunately the mapping is refcnt'd and holds a reference to the object,
> > > so we can drop the handle's reference once we hold a mapping reference.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >  drivers/gpu/drm/panfrost/panfrost_drv.c |  7 +++
> > >  drivers/gpu/drm/panfrost/panfrost_gem.c | 10 +++---
> > >  2 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > index 2fa5afe21288..aa5848de647c 100644
> > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > @@ -98,6 +98,13 @@ static int panfrost_ioctl_create_bo(struct drm_device 
> > > *dev, void *data,
> > > return PTR_ERR(bo);
> > >
> > > mapping = panfrost_gem_mapping_get(bo, priv);
> > > +
> > > +   /*
> > > +* Now that the mapping holds a reference to the bo until we no 
> > > longer
> > > +* need it, we can safely drop the handle's reference.
> > > +*/
> > Not too familiar with panfrost, but I don't see
> > panfrost_gem_mapping_get hold a reference to the bo?
>
> It doesn't directly, but the mapping already holds a reference, taken
> before the handle reference is dropped
>
> It is all a bit too subtle for my taste.
Ack.
> > > +   drm_gem_object_put(>base.base);
> > > +
> > > if (!mapping) {
> > > drm_gem_object_put(>base.base);
> > > return -EINVAL;
> > > diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
> > > b/drivers/gpu/drm/panfrost/panfrost_gem.c
> > > index 293e799e2fe8..e3e21c500d24 100644
> > > --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> > > +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> > > @@ -234,6 +234,10 @@ struct drm_gem_object 
> > > *panfrost_gem_create_object(struct drm_device *dev, size_t
> > > return >base.base;
> > >  }
> > >
> > > +/*
> > > + * NOTE: if this succeeds, both the handle and the returned object have
> > > + * an outstanding reference.
> > > + */
> > I might suggest dropping the "_with_handle" suffix.
>
> Maybe _and_handle would be a better name (for this and other cases
> that return both a handle and obj)?
Sounds good.  We will want that for panfrost, qxl, and vmwgfx.

The other drivers and helpers only need the handle.  Their
_with_handle should be fixed to not return the pointer.

>
> > The naming convention is used in several drivers.  I think we should
> > make it the case that the _with_handle variants always return the
> > handle without the pointer.  (And with the change, it immediately
> > becomes clear that qxl and vmwgfx also have similar issues).
>
> ugg, yeah, qxl does have the issue in the qxl_mode_dumb_create path.
> I overlooked that it returns an obj pointer by reference.
>
> on the surface vmwgfx looked ok, but I could have overlooked something.
>
> BR,
> -R
>
> > >  struct panfrost_gem_object *
> > >  panfrost_gem_create_with_handle(struct drm_file *file_priv,
> > > struct drm_device *dev, size_t size,
> > > @@ -261,10 +265,10 @@ panfrost_gem_create_with_handle(struct drm_file 
> > > *file_priv,
> > >  * and handle has the id what user can see.
> > >  */
> > > ret = drm_gem_handle_create(file_priv, >base, handle);
> > > -   /* drop reference from allocate - handle holds it now. */
> > > -   drm_gem_object_put(>base);
> > > -   if (ret)
> > > +   if (ret) {
> > > +   drm_gem_object_put(>base);
> > > return ERR_PTR(ret);
> > > +   }
> > >
> > > return bo;
> > >  }
> > > --
> > > 2.38.1
> > >

Re: [PATCH] drm/panfrost: Fix GEM handle creation UAF

2022-12-16 Thread Chia-I Wu

On Fri, Dec 16, 2022 at 3:34 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Relying on an unreturned handle to hold a reference to an object we
> dereference is not safe.  Userspace can guess the handle and race us
> by closing the handle from another thread.  The _create_with_handle()
> that returns an object ptr is pretty much a pattern to avoid.  And
> ideally creating the handle would be done after any needed dererencing.
> But in this case creation of the mapping is tied to the handle creation.
> Fortunately the mapping is refcnt'd and holds a reference to the object,
> so we can drop the handle's reference once we hold a mapping reference.
>
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c |  7 +++
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 10 +++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2fa5afe21288..aa5848de647c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -98,6 +98,13 @@ static int panfrost_ioctl_create_bo(struct drm_device 
> *dev, void *data,
> return PTR_ERR(bo);
>
> mapping = panfrost_gem_mapping_get(bo, priv);
> +
> +   /*
> +* Now that the mapping holds a reference to the bo until we no longer
> +* need it, we can safely drop the handle's reference.
> +*/
Not too familiar with panfrost, but I don't see
panfrost_gem_mapping_get hold a reference to the bo?

> +   drm_gem_object_put(>base.base);
> +
> if (!mapping) {
> drm_gem_object_put(>base.base);
> return -EINVAL;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
> b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 293e799e2fe8..e3e21c500d24 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -234,6 +234,10 @@ struct drm_gem_object *panfrost_gem_create_object(struct 
> drm_device *dev, size_t
> return >base.base;
>  }
>
> +/*
> + * NOTE: if this succeeds, both the handle and the returned object have
> + * an outstanding reference.
> + */
I might suggest dropping the "_with_handle" suffix.

The naming convention is used in several drivers.  I think we should
make it the case that the _with_handle variants always return the
handle without the pointer.  (And with the change, it immediately
becomes clear that qxl and vmwgfx also have similar issues).

>  struct panfrost_gem_object *
>  panfrost_gem_create_with_handle(struct drm_file *file_priv,
> struct drm_device *dev, size_t size,
> @@ -261,10 +265,10 @@ panfrost_gem_create_with_handle(struct drm_file 
> *file_priv,
>  * and handle has the id what user can see.
>  */
> ret = drm_gem_handle_create(file_priv, >base, handle);
> -   /* drop reference from allocate - handle holds it now. */
> -   drm_gem_object_put(>base);
> -   if (ret)
> +   if (ret) {
> +   drm_gem_object_put(>base);
> return ERR_PTR(ret);
> +   }
>
> return bo;
>  }
> --
> 2.38.1
>

Re: [PATCH] drm/virtio: Fix GEM handle creation UAF

2022-12-16 Thread Chia-I Wu

On Fri, Dec 16, 2022 at 3:33 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Userspace can guess the handle value and try to race GEM object creation
> with handle close, resulting in a use-after-free if we dereference the
> object after dropping the handle's reference.  For that reason, dropping
> the handle's reference must be done *after* we are done dereferencing
> the object.
>
> Signed-off-by: Rob Clark 
Reviewed-by: Chia-I Wu

Re: [PATCH] drm/msm: Enable clamp_to_idle for 7c3

2022-11-15 Thread Chia-I Wu

On Tue, Nov 15, 2022 at 8:01 AM Doug Anderson  wrote:
>
> Hi,
>
> On Tue, Nov 15, 2022 at 7:55 AM Rob Clark  wrote:
> >
> > From: Rob Clark 
> >
> > This was overlooked.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 +++---
> >  1 file changed, 7 insertions(+), 7 deletions(-)
>
> Reviewed-by: Douglas Anderson 
Reviewed-by: Chia-I Wu

Re: [PATCH v3 2/2] drm/msm: Hangcheck progress detection

2022-11-09 Thread Chia-I Wu

_rem;
> +};
> +
>  struct msm_ringbuffer {
> struct msm_gpu *gpu;
> int id;
> @@ -64,6 +69,25 @@ struct msm_ringbuffer {
> uint64_t memptrs_iova;
> struct msm_fence_context *fctx;
>
> +   /**
> +* hangcheck_progress_retries:
> +*
> +* The number of extra hangcheck duration cycles that we have given
> +* due to it appearing that the GPU is making forward progress.
> +*
> +* If the GPU appears to be making progress (ie. the CP has advanced
> +* in the command stream, we'll allow up to 
> DRM_MSM_HANGCHECK_PROGRESS_RETRIES
> +* expirations of the hangcheck timer before killing the job.  In 
> other
> +* words we'll let the submit run for up to
> +* DRM_MSM_HANGCHECK_DEFAULT_PERIOD *  
> DRM_MSM_HANGCHECK_PROGRESS_RETRIES
Rather than 500*3ms, the effective timeout is 250*4ms if I read
made_progress correctly.  That is, the formula is

  (DRM_MSM_HANGCHECK_DEFAULT_PERIOD / 2) *
(DRM_MSM_HANGCHECK_PROGRESS_RETRIES + 1)

Are you targeting 1500ms or 1000ms?  Either way, series is

Reviewed-by: Chia-I Wu 
Tested-by: Chia-I Wu 
(dEQP-GLES2.functional.flush_finish.wait)

> +*/
> +   int hangcheck_progress_retries;
> +
> +   /**
> +* last_cp_state: The state of the CP at the last call to 
> gpu->progress()
> +*/
> +   struct msm_cp_state last_cp_state;
> +
> /*
>  * preempt_lock protects preemption and serializes wptr updates 
> against
>  * preemption.  Can be aquired from irq context.
> --
> 2.38.1
>

Re: [Freedreno] [PATCH 0/3] drm/msm/a6xx: devcore dump fixes

2022-10-13 Thread Chia-I Wu

On Thu, Oct 13, 2022 at 3:55 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> First patch fixes a recently introduced memory corruption, the remaining
> two are cleanups.
Series is

Reviewed-by: Chia-I Wu 

> Rob Clark (3):
>   drm/msm/a6xx: Fix kvzalloc vs state_kcalloc usage
>   drm/msm/a6xx: Skip snapshotting unused GMU buffers
>   drm/msm/a6xx: Remove state objects from list before freeing
>
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 18 --
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  7 ++-
>  2 files changed, 22 insertions(+), 3 deletions(-)
>
> --
> 2.37.3
>

Re: [Freedreno] [PATCH] drm/msm/gem: Unpin objects slightly later

2022-09-29 Thread Chia-I Wu

On Fri, Sep 23, 2022 at 3:41 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> The introduction of 025d27239a2f exposes a problem with f371bcc0c2ac, in
> that we need to keep the object pinned in the time the submit is queued
> up in the gpu scheduler.  Otherwise the shrinker will see it as a thing
> that can be evicted if we wait for it to be signaled.  But if the
> shrinker path is waiting on it with the obj lock held, the job cannot be
> scheduled, as that also requires briefly grabbing the obj lock, leading
> to deadlock.  (Not to mention, we don't want the shrinker to evict an
> obj queued up in gpu scheduler.)
>
> Fixes: f371bcc0c2ac ("drm/msm/gem: Unpin buffers earlier")
> Fixes: 025d27239a2f ("drm/msm/gem: Evict active GEM objects when necessary")
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/19
> Signed-off-by: Rob Clark 
Tested-by: Chia-I Wu 
> ---
>  drivers/gpu/drm/msm/msm_gem_submit.c | 4 ++--
>  drivers/gpu/drm/msm/msm_ringbuffer.c | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
> b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 5599d93ec0d2..c670591995e6 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -501,11 +501,11 @@ static int submit_reloc(struct msm_gem_submit *submit, 
> struct msm_gem_object *ob
>   */
>  static void submit_cleanup(struct msm_gem_submit *submit, bool error)
>  {
> -   unsigned cleanup_flags = BO_LOCKED | BO_OBJ_PINNED;
> +   unsigned cleanup_flags = BO_LOCKED;
> unsigned i;
>
> if (error)
> -   cleanup_flags |= BO_VMA_PINNED;
> +   cleanup_flags |= BO_VMA_PINNED | BO_OBJ_PINNED;
>
> for (i = 0; i < submit->nr_bos; i++) {
> struct msm_gem_object *msm_obj = submit->bos[i].obj;
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
> b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index cad4c3525f0b..57a8e9564540 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -25,7 +25,8 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
> *job)
>
> msm_gem_lock(obj);
> msm_gem_unpin_vma_fenced(submit->bos[i].vma, fctx);
> -   submit->bos[i].flags &= ~BO_VMA_PINNED;
> +   msm_gem_unpin_locked(obj);
> +   submit->bos[i].flags &= ~(BO_VMA_PINNED | BO_OBJ_PINNED);
> msm_gem_unlock(obj);
> }
>
> --
> 2.37.2
>

Re: [PATCH v2] virtio-gpu: fix shift wrapping bug in virtio_gpu_fence_event_create()

2022-09-19 Thread Chia-I Wu

On Sun, Sep 18, 2022 at 11:36 PM Dan Carpenter  wrote:
>
> The ->ring_idx_mask variable is a u64 so static checkers, Smatch in
> this case, complain if the BIT() is not also a u64.
>
> drivers/gpu/drm/virtio/virtgpu_ioctl.c:50 virtio_gpu_fence_event_create()
> warn: should '(1 << ring_idx)' be a 64 bit type?
>
> Fixes: cd7f5ca33585 ("drm/virtio: implement context init: add 
> virtio_gpu_fence_event")
> Signed-off-by: Dan Carpenter 
> ---
> v2: Style change.  Use BIT_ULL().
Reviewed-by: Chia-I Wu 
>
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 3b1701607aae..5d05093014ac 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -47,7 +47,7 @@ static int virtio_gpu_fence_event_create(struct drm_device 
> *dev,
> struct virtio_gpu_fence_event *e = NULL;
> int ret;
>
> -   if (!(vfpriv->ring_idx_mask & (1 << ring_idx)))
> +   if (!(vfpriv->ring_idx_mask & BIT_ULL(ring_idx)))
> return 0;
>
> e = kzalloc(sizeof(*e), GFP_KERNEL);
> --
> 2.35.1
>

Re: [PATCH] virtio-gpu: fix shift wrapping bug in virtio_gpu_fence_event_create()

2022-09-15 Thread Chia-I Wu

On Thu, Sep 15, 2022 at 4:14 AM Dan Carpenter  wrote:
>
> The ->ring_idx_mask variable is a u64 so static checkers, Smatch in
> this case, complain if the BIT() is not also a u64.
>
> drivers/gpu/drm/virtio/virtgpu_ioctl.c:50 virtio_gpu_fence_event_create()
> warn: should '(1 << ring_idx)' be a 64 bit type?
>
> Fixes: cd7f5ca33585 ("drm/virtio: implement context init: add 
> virtio_gpu_fence_event")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 3b1701607aae..14eedb75f8a8 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -47,7 +47,7 @@ static int virtio_gpu_fence_event_create(struct drm_device 
> *dev,
> struct virtio_gpu_fence_event *e = NULL;
> int ret;
>
> -   if (!(vfpriv->ring_idx_mask & (1 << ring_idx)))
> +   if (!(vfpriv->ring_idx_mask & (1ULL << ring_idx)))
BIT_ULL(ring_indx)?

> return 0;
>
> e = kzalloc(sizeof(*e), GFP_KERNEL);
> --
> 2.35.1
>

[PATCH] drm/virtio: set fb_modifiers_not_supported

2022-08-31 Thread Chia-I Wu

Without this, the drm core advertises LINEAR modifier which is
incorrect.

Also userspace virgl does not support modifiers.  For example, it causes
chrome on ozone/drm to fail with "Failed to create scanout buffer".

Fixes: 2af104290da5 ("drm: introduce fb_modifiers_not_supported flag in 
mode_config")
Suggested-by: Shao-Chuan Lee 
Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c 
b/drivers/gpu/drm/virtio/virtgpu_display.c
index 5c7f198c0712..9ea7611a9e0f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -349,6 +349,8 @@ int virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev)
vgdev->ddev->mode_config.max_width = XRES_MAX;
vgdev->ddev->mode_config.max_height = YRES_MAX;
 
+   vgdev->ddev->mode_config.fb_modifiers_not_supported = true;
+
for (i = 0 ; i < vgdev->num_scanouts; ++i)
vgdev_output_init(vgdev, i);
 
-- 
2.37.2.789.g6183377224-goog

Re: [PATCH v2] drm/virtio: Fix NULL vs IS_ERR checking in virtio_gpu_object_shmem_init

2022-06-11 Thread Chia-I Wu

On Thu, Jun 2, 2022 at 3:42 AM Miaoqian Lin  wrote:
>
> Since drm_prime_pages_to_sg() function return error pointers.
> The drm_gem_shmem_get_sg_table() function returns error pointers too.
> Using IS_ERR() to check the return value to fix this.
>
> Fixes: 2f2aa13724d5 ("drm/virtio: move virtio_gpu_mem_entry initialization to 
> new function")
> Signed-off-by: Miaoqian Lin 
> ---
> changes in v2:
> - Update Fixes tag.
> - rebase the working tree.
> v1 Link: 
> https://lore.kernel.org/all/20211222072649.18169-1-linmq...@gmail.com/
Reviewed-by: Chia-I Wu

Re: [PATCH v2] drm/msm/adreno: Allow larger address space size

2022-06-11 Thread Chia-I Wu

On Sun, May 29, 2022 at 11:04 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> The restriction to 4G was strictly to work around 64b math bug in some
> versions of SQE firmware.  This appears to be fixed in a650+ SQE fw, so
> allow a larger address space size on these devices.
>
> Also, add a modparam override for debugging and igt.
>
> v2: Send the right version of the patch (ie. the one that actually
> compiles)
>
> Signed-off-by: Rob Clark 
Reviewed-by: Chia-I Wu

Re: [PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

2022-04-27 Thread Chia-I Wu

On Wed, Apr 27, 2022 at 9:07 AM Rob Clark  wrote:
>
> On Tue, Apr 26, 2022 at 11:20 PM Christian König
>  wrote:
> >
> > Am 26.04.22 um 20:50 schrieb Chia-I Wu:
> > > On Tue, Apr 26, 2022 at 11:02 AM Christian König
> > >  wrote:
> > >> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> > >>> [SNIP]
> > >>>>>> Well I just send a patch to completely remove the trace point.
> > >>>>>>
> > >>>>>> As I said it absolutely doesn't make sense to use this for
> > >>>>>> visualization, that's what the trace_dma_fence_init trace point is 
> > >>>>>> good for.
> > >>> I am a bit confused by this.  _emit and _signaled are a great way to
> > >>> see how many fences are pending from cpu's point of view.  How does
> > >>> _emit make no sense and _init is good instead?
> > >> We had exactly that confusion now multiple times and it's one of the
> > >> main reasons why I want to remove the _emit trace point.
> > >>
> > >> See the when you want to know how many fences are pending you need to
> > >> watch out for init/destroy and *NOT* emit.
> > >>
> > >> The reason is that in the special case where emit makes sense (e.g. the
> > >> GPU scheduler fences) emit comes later than init, but pending on the CPU
> > >> and taking up resources are all fences and not just the one emitted to
> > >> the hardware.
> > > I am more interested in pending on the GPU.
> > >
> > >> On the other hand when you want to measure how much time each operation
> > >> took on the hardware you need to take a look at the differences of the
> > >> signal events on each timeline.
> > > _signaled alone is not enough when the GPU is not always busy.  After
> > > the last pending fence signals but before the following _init/_emit,
> > > nothing is pending.
> >
> > Yeah, I'm perfectly aware of that.
> >
> > > For all drivers except virtio-gpu, _init and "ring head update" always
> > > happen close enough that I can see why _emit is redundant.  But I like
> > > having _emit as a generic tracepoint for timelines where _init and
> > > _emit can be apart, instead of requiring a special case tracepoint for
> > > each special case timeline.
> >
> > And I'm certainly not going to add _emit to all drivers just because of
> > that. As you said it is a special case for virtio-gpu and the GPU scheduler.
> >
> > And as I explained before the difference between _init and _emit
> > shouldn't matter to your visualization. The background is that as soon
> > as an dma_fence is initialized with _init it is "live" regarding the
> > dependency and memory management and exactly that's what matters for the
> > visualization.
> >
> > The latency between _init and _emit is just interesting for debugging
> > the scheduler and surprisingly virtio-gpu as well, for all other use
> > cases it is irrelevant.
>
> It might actually be *more* interesting for virtio-gpu.. unless there
> is some other way to link guest and host fences to see what the
> potential latency of guest->host is
>
> re: adding the tracepoint to other drivers, I'm fine with folks doing
> that as needed.  Unless you have a better proposal about how to
> visualize init vs emit latency, I think it's fine to have an extra
> tracepoint even if it is redundant in some cases.  The visualization
> tool is the customer here, we have to give it what it wants/needs.
As far as perfetto is concerned, I just use either _init or _emit on a
per-timeline basis.  We can drop this patch for msm, and do not need
to change drivers whose latencies between _init/_emit are ignorable.

init vs emit latency is still interesting.  I prefer keeping _init /
_emit as generic events that tools can parse, rather than adding
per-driver special cases that tools need to understand.

>
> BR,
> -R
>
> >
> > Regards,
> > Christian.
> >
> > >> So there isn't really any use case for the emit trace point, except when
> > >> you want to figure out how much latency the scheduler introduce. Then
> > >> you want to take a look at init and emit, but that isn't really that
> > >> interesting for performance analyses.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> >

Re: [PATCH v2 1/2] drm/sched: use DECLARE_EVENT_CLASS

That would be great.  I don't have push permission.

On Tue, Apr 26, 2022 at 11:25 AM Andrey Grodzovsky
 wrote:
>
> It's ok to land but it wasn't, do you have push permissions to
> drm-misc-next ? If not, I will do it for you.
>
> Andrey
>
> On 2022-04-26 12:29, Chia-I Wu wrote:
> > On Tue, Apr 12, 2022 at 1:48 PM Chia-I Wu  wrote:
> >> drm_sched_job and drm_run_job have the same prototype.
> >>
> >> v2: rename the class from drm_sched_job_entity to drm_sched_job (Andrey)
> >>
> >> Signed-off-by: Chia-I Wu 
> >> Cc: Rob Clark 
> >> Reviewed-by: Andrey Grodzovsky 
> > This series has been reviewed.  Is it ok to land (if it hasn't)?

Re: [PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

On Tue, Apr 26, 2022 at 11:02 AM Christian König
 wrote:
>
> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> > [SNIP]
> >>>> Well I just send a patch to completely remove the trace point.
> >>>>
> >>>> As I said it absolutely doesn't make sense to use this for
> >>>> visualization, that's what the trace_dma_fence_init trace point is good 
> >>>> for.
> > I am a bit confused by this.  _emit and _signaled are a great way to
> > see how many fences are pending from cpu's point of view.  How does
> > _emit make no sense and _init is good instead?
>
> We had exactly that confusion now multiple times and it's one of the
> main reasons why I want to remove the _emit trace point.
>
> See the when you want to know how many fences are pending you need to
> watch out for init/destroy and *NOT* emit.
>
> The reason is that in the special case where emit makes sense (e.g. the
> GPU scheduler fences) emit comes later than init, but pending on the CPU
> and taking up resources are all fences and not just the one emitted to
> the hardware.
I am more interested in pending on the GPU.

>
> On the other hand when you want to measure how much time each operation
> took on the hardware you need to take a look at the differences of the
> signal events on each timeline.
_signaled alone is not enough when the GPU is not always busy.  After
the last pending fence signals but before the following _init/_emit,
nothing is pending.

For all drivers except virtio-gpu, _init and "ring head update" always
happen close enough that I can see why _emit is redundant.  But I like
having _emit as a generic tracepoint for timelines where _init and
_emit can be apart, instead of requiring a special case tracepoint for
each special case timeline.

>
> So there isn't really any use case for the emit trace point, except when
> you want to figure out how much latency the scheduler introduce. Then
> you want to take a look at init and emit, but that isn't really that
> interesting for performance analyses.
>
> Regards,
> Christian.
>

Re: [PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

On Tue, Apr 26, 2022 at 10:20 AM Christian König
 wrote:
>
> Am 26.04.22 um 19:16 schrieb Rob Clark:
> > On Tue, Apr 26, 2022 at 10:08 AM Christian König
> >  wrote:
> >> Am 26.04.22 um 19:05 schrieb Rob Clark:
> >>> On Tue, Apr 26, 2022 at 9:42 AM Christian König
> >>>  wrote:
> >>>> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
> >>>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu  wrote:
> >>>>>> In practice, trace_dma_fence_init called from dma_fence_init is good
> >>>>>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> >>>>>> and virtio both have cases where trace_dma_fence_init and
> >>>>>> trace_dma_fence_emit can be apart.  It is easier for visualization 
> >>>>>> tools
> >>>>>> to always use the more correct trace_dma_fence_emit when visualizing
> >>>>>> fence timelines.
> >>>>>>
> >>>>>> v2: improve commit message (Dmitry)
> >>>>>>
> >>>>>> Signed-off-by: Chia-I Wu 
> >>>>>> Cc: Rob Clark 
> >>>>>> Reviewed-by: Dmitry Baryshkov 
> >>>>> This has been reviewed.  Should we land it?
> >>>> No, there are still open discussions about it.
> >>> I think if it is needed for trace visualization, that is justification
> >>> enough for me
> >>>
> >>> I don't really see otherwise how a generic trace visualization tool
> >>> like perfetto would handle the case that some fence timelines have
> >>> separate events but others do not.
> >> Well I just send a patch to completely remove the trace point.
> >>
> >> As I said it absolutely doesn't make sense to use this for
> >> visualization, that's what the trace_dma_fence_init trace point is good 
> >> for.
I am a bit confused by this.  _emit and _signaled are a great way to
see how many fences are pending from cpu's point of view.  How does
_emit make no sense and _init is good instead?

Or is this just that _init is good enough most of the time?  (More below)

> >>
> >> The only use case is for debugging the GPU scheduler and we should
> >> probably introduce a separate GPU scheduler specific trace point for
> >> this instead if we should ever need it.
> > Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit..
> > OTOH if using separate events in these special cases is better, then
> > I'm ok with that and can revert this patch.  Chia-I is more familiar
> > with the visualization end of it, so I'll let him comment on whether
> > that is a workable approach.
>
> Interesting, I wasn't aware of the virtgpu separation of init and emit.
>
> But yes if there is really an use case for tracing this time stamp as
> well then we should probably have that use case specific.
>
> I just looked into the scheduler case a bit and found that even there we
> already have a different trace point for it, which is probably the
> reason why we never used trace_dma_fence_emit there.
Yeah, I am using drm_sched tracepoints in that case.
>
> So yes, there really isn't much reason I can see two have two separate
> trace points for every driver.
That sounds fair.  In any tool, it should be easy to see if a fence
timeline has _emit in addition to _init, and adapt accordingly.  We
can drop this patch.

A clarification that _emit is optional/redundant for most fence
timelines should be nicer than removing it though.

>
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> (Or, how do I check if it has landed?)
>

Re: [PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu  wrote:
>
> In practice, trace_dma_fence_init called from dma_fence_init is good
> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> and virtio both have cases where trace_dma_fence_init and
> trace_dma_fence_emit can be apart.  It is easier for visualization tools
> to always use the more correct trace_dma_fence_emit when visualizing
> fence timelines.
>
> v2: improve commit message (Dmitry)
>
> Signed-off-by: Chia-I Wu 
> Cc: Rob Clark 
> Reviewed-by: Dmitry Baryshkov 
This has been reviewed.  Should we land it?

(Or, how do I check if it has landed?)

Re: [PATCH v2 1/2] drm/sched: use DECLARE_EVENT_CLASS

On Tue, Apr 12, 2022 at 1:48 PM Chia-I Wu  wrote:
>
> drm_sched_job and drm_run_job have the same prototype.
>
> v2: rename the class from drm_sched_job_entity to drm_sched_job (Andrey)
>
> Signed-off-by: Chia-I Wu 
> Cc: Rob Clark 
> Reviewed-by: Andrey Grodzovsky 
This series has been reviewed.  Is it ok to land (if it hasn't)?

Re: [PATCH 3/3] drm/msm: return the average load over the polling period

On Fri, Apr 15, 2022 at 5:33 PM Chia-I Wu  wrote:
>
> simple_ondemand interacts poorly with clamp_to_idle.  It only looks at
> the load since the last get_dev_status call, while it should really look
> at the load over polling_ms.  When clamp_to_idle true, it almost always
> picks the lowest frequency on active because the gpu is idle between
> msm_devfreq_idle/msm_devfreq_active.
>
> This logic could potentially be moved into devfreq core.
The idea is to extend devfreq_simple_ondemand_data to specify whether
devfreq_simple_ondemand_func should use "last status" or "average
status" to determine the target frequency.  devfreq core will need to
store "struct devfreq_dev_status average_status", which will be
updated when a device uses simple_ondemand and asks for average_status
instead of last_status.



>
> Fixes: 7c0ffcd40b16 ("drm/msm/gpu: Respect PM QoS constraints")
> Signed-off-by: Chia-I Wu 
> Cc: Rob Clark 
> ---
>  drivers/gpu/drm/msm/msm_gpu.h |  3 ++
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 60 ++-
>  2 files changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 389c6dab751b..143c56f5185b 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -9,6 +9,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -117,6 +118,8 @@ struct msm_gpu_devfreq {
> /** idle_time: Time of last transition to idle: */
> ktime_t idle_time;
>
> +   struct devfreq_dev_status average_status;
> +
> /**
>  * idle_work:
>  *
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
> b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index f531015107c3..d2539ca78c29 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -9,6 +9,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /*
> @@ -75,12 +76,69 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
> status->busy_time = busy_time;
>  }
>
> +static void update_average_dev_status(struct msm_gpu *gpu,
> +   const struct devfreq_dev_status *raw)
> +{
> +   struct msm_gpu_devfreq *df = >devfreq;
> +   const u32 polling_ms = df->devfreq->profile->polling_ms;
> +   const u32 max_history_ms = polling_ms * 11 / 10;
> +   struct devfreq_dev_status *avg = >average_status;
> +   u64 avg_freq;
> +
> +   /* simple_ondemand governor interacts poorly with gpu->clamp_to_idle.
> +* When we enforce the constraint on idle, it calls get_dev_status
> +* which would normally reset the stats.  When we remove the
> +* constraint on active, it calls get_dev_status again where busy_time
> +* would be 0.
> +*
> +* To remedy this, we always return the average load over the past
> +* polling_ms.
> +*/
> +
> +   /* raw is longer than polling_ms or avg has no history */
> +   if (div_u64(raw->total_time, USEC_PER_MSEC) >= polling_ms ||
> +   !avg->total_time) {
> +   *avg = *raw;
> +   return;
> +   }
> +
> +   /* Truncate the oldest history first.
> +*
> +* Because we keep the history with a single devfreq_dev_status,
> +* rather than a list of devfreq_dev_status, we have to assume freq
> +* and load are the same over avg->total_time.  We can scale down
> +* avg->busy_time and avg->total_time by the same factor to drop
> +* history.
> +*/
> +   if (div_u64(avg->total_time + raw->total_time, USEC_PER_MSEC) >=
> +   max_history_ms) {
> +   const u32 new_total_time = polling_ms * USEC_PER_MSEC -
> +   raw->total_time;
> +   avg->busy_time = div_u64(
> +   mul_u32_u32(avg->busy_time, new_total_time),
> +   avg->total_time);
> +   avg->total_time = new_total_time;
> +   }
> +
> +   /* compute the average freq over avg->total_time + raw->total_time */
> +   avg_freq = mul_u32_u32(avg->current_frequency, avg->total_time);
> +   avg_freq += mul_u32_u32(raw->current_frequency, raw->total_time);
> +   do_div(avg_freq, avg->total_time + raw->total_time);
> +
> +   avg->current_frequency = avg_freq;
> +   avg->busy_time += raw->busy_time;
> +   avg->total_time += raw->total_time;
> +}
> +
>  static int msm_devfreq_get_dev_status(struct device *dev,
> struct devfreq_dev_status *status)
>  {
> struct msm_gpu *gpu = dev_to_gpu(dev);
> +   struct devfreq_dev_status raw;
>
> -   get_raw_dev_status(gpu, status);
> +   get_raw_dev_status(gpu, );
> +   update_average_dev_status(gpu, );
> +   *status = gpu->devfreq.average_status;
>
> return 0;
>  }
> --
> 2.36.0.rc0.470.gd361397f0d-goog
>

[PATCH 3/3] drm/msm: return the average load over the polling period

simple_ondemand interacts poorly with clamp_to_idle.  It only looks at
the load since the last get_dev_status call, while it should really look
at the load over polling_ms.  When clamp_to_idle true, it almost always
picks the lowest frequency on active because the gpu is idle between
msm_devfreq_idle/msm_devfreq_active.

This logic could potentially be moved into devfreq core.

Fixes: 7c0ffcd40b16 ("drm/msm/gpu: Respect PM QoS constraints")
Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.h |  3 ++
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 60 ++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 389c6dab751b..143c56f5185b 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -117,6 +118,8 @@ struct msm_gpu_devfreq {
/** idle_time: Time of last transition to idle: */
ktime_t idle_time;
 
+   struct devfreq_dev_status average_status;
+
/**
 * idle_work:
 *
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index f531015107c3..d2539ca78c29 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -75,12 +76,69 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
status->busy_time = busy_time;
 }
 
+static void update_average_dev_status(struct msm_gpu *gpu,
+   const struct devfreq_dev_status *raw)
+{
+   struct msm_gpu_devfreq *df = >devfreq;
+   const u32 polling_ms = df->devfreq->profile->polling_ms;
+   const u32 max_history_ms = polling_ms * 11 / 10;
+   struct devfreq_dev_status *avg = >average_status;
+   u64 avg_freq;
+
+   /* simple_ondemand governor interacts poorly with gpu->clamp_to_idle.
+* When we enforce the constraint on idle, it calls get_dev_status
+* which would normally reset the stats.  When we remove the
+* constraint on active, it calls get_dev_status again where busy_time
+* would be 0.
+*
+* To remedy this, we always return the average load over the past
+* polling_ms.
+*/
+
+   /* raw is longer than polling_ms or avg has no history */
+   if (div_u64(raw->total_time, USEC_PER_MSEC) >= polling_ms ||
+   !avg->total_time) {
+   *avg = *raw;
+   return;
+   }
+
+   /* Truncate the oldest history first.
+*
+* Because we keep the history with a single devfreq_dev_status,
+* rather than a list of devfreq_dev_status, we have to assume freq
+* and load are the same over avg->total_time.  We can scale down
+* avg->busy_time and avg->total_time by the same factor to drop
+* history.
+*/
+   if (div_u64(avg->total_time + raw->total_time, USEC_PER_MSEC) >=
+   max_history_ms) {
+   const u32 new_total_time = polling_ms * USEC_PER_MSEC -
+   raw->total_time;
+   avg->busy_time = div_u64(
+   mul_u32_u32(avg->busy_time, new_total_time),
+   avg->total_time);
+   avg->total_time = new_total_time;
+   }
+
+   /* compute the average freq over avg->total_time + raw->total_time */
+   avg_freq = mul_u32_u32(avg->current_frequency, avg->total_time);
+   avg_freq += mul_u32_u32(raw->current_frequency, raw->total_time);
+   do_div(avg_freq, avg->total_time + raw->total_time);
+
+   avg->current_frequency = avg_freq;
+   avg->busy_time += raw->busy_time;
+   avg->total_time += raw->total_time;
+}
+
 static int msm_devfreq_get_dev_status(struct device *dev,
struct devfreq_dev_status *status)
 {
struct msm_gpu *gpu = dev_to_gpu(dev);
+   struct devfreq_dev_status raw;
 
-   get_raw_dev_status(gpu, status);
+   get_raw_dev_status(gpu, );
+   update_average_dev_status(gpu, );
+   *status = gpu->devfreq.average_status;
 
return 0;
 }
-- 
2.36.0.rc0.470.gd361397f0d-goog

[PATCH 2/3] drm/msm: simplify gpu_busy callback

Move tracking and busy time calculation to msm_devfreq_get_dev_status.

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 19 ++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 15 +
 drivers/gpu/drm/msm/msm_gpu.h |  9 +++-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 32 ++-
 4 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 407f50a15faa..217615e0e850 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1662,28 +1662,23 @@ static struct msm_ringbuffer *a5xx_active_ring(struct 
msm_gpu *gpu)
return a5xx_gpu->cur_ring;
 }
 
-static unsigned long a5xx_gpu_busy(struct msm_gpu *gpu)
+static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 {
-   u64 busy_cycles, busy_time;
+   u64 busy_cycles;
 
/* Only read the gpu busy if the hardware is already active */
-   if (pm_runtime_get_if_in_use(>pdev->dev) == 0)
+   if (pm_runtime_get_if_in_use(>pdev->dev) == 0) {
+   *out_sample_rate = 1;
return 0;
+   }
 
busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
-
-   busy_time = busy_cycles - gpu->devfreq.busy_cycles;
-   do_div(busy_time, clk_get_rate(gpu->core_clk) / 100);
-
-   gpu->devfreq.busy_cycles = busy_cycles;
+   *out_sample_rate = clk_get_rate(gpu->core_clk);
 
pm_runtime_put(>pdev->dev);
 
-   if (WARN_ON(busy_time > ~0LU))
-   return ~0LU;
-
-   return (unsigned long)busy_time;
+   return busy_cycles;
 }
 
 static uint32_t a5xx_get_rptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index ccc4fcf7a630..fba6259395dd 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1649,12 +1649,14 @@ static void a6xx_destroy(struct msm_gpu *gpu)
kfree(a6xx_gpu);
 }
 
-static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
+static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
-   u64 busy_cycles, busy_time;
+   u64 busy_cycles;
 
+   /* 19.2MHz */
+   *out_sample_rate = 1920;
 
/* Only read the gpu busy if the hardware is already active */
if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
@@ -1664,17 +1666,10 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
 
-   busy_time = (busy_cycles - gpu->devfreq.busy_cycles) * 10;
-   do_div(busy_time, 192);
-
-   gpu->devfreq.busy_cycles = busy_cycles;
 
pm_runtime_put(a6xx_gpu->gmu.dev);
 
-   if (WARN_ON(busy_time > ~0LU))
-   return ~0LU;
-
-   return (unsigned long)busy_time;
+   return busy_cycles;
 }
 
 static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 02419f2ca2bc..389c6dab751b 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -62,7 +62,7 @@ struct msm_gpu_funcs {
/* for generation specific debugfs: */
void (*debugfs_init)(struct msm_gpu *gpu, struct drm_minor *minor);
 #endif
-   unsigned long (*gpu_busy)(struct msm_gpu *gpu);
+   u64 (*gpu_busy)(struct msm_gpu *gpu, unsigned long *out_sample_rate);
struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
int (*gpu_state_put)(struct msm_gpu_state *state);
unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
@@ -106,11 +106,8 @@ struct msm_gpu_devfreq {
struct dev_pm_qos_request boost_freq;
 
/**
-* busy_cycles:
-*
-* Used by implementation of gpu->gpu_busy() to track the last
-* busy counter value, for calculating elapsed busy cycles since
-* last sampling period.
+* busy_cycles: Last busy counter value, for calculating elapsed busy
+* cycles since last sampling period.
 */
u64 busy_cycles;
 
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index 317a95d42922..f531015107c3 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -49,18 +49,38 @@ static unsigned long get_freq(struct msm_gpu *gpu)
return clk_get_rate(gpu->core_clk);
 }
 
-static int msm_devfreq_get_dev_status(struct device *dev,
+static void get_raw_dev_status(struct msm_gp

[PATCH 1/3] drm/msm: remove explicit devfreq status reset

It is redundant since commit 7c0ffcd40b16 ("drm/msm/gpu: Respect PM QoS
constraints") because dev_pm_qos_update_request triggers get_dev_status.

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index 12641616acd3..317a95d42922 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -224,7 +224,6 @@ void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor)
 void msm_devfreq_active(struct msm_gpu *gpu)
 {
struct msm_gpu_devfreq *df = >devfreq;
-   struct devfreq_dev_status status;
unsigned int idle_time;
 
if (!has_devfreq(gpu))
@@ -248,12 +247,6 @@ void msm_devfreq_active(struct msm_gpu *gpu)
 
dev_pm_qos_update_request(>idle_freq,
  PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
-
-   /*
-* Reset the polling interval so we aren't inconsistent
-* about freq vs busy/total cycles
-*/
-   msm_devfreq_get_dev_status(>pdev->dev, );
 }
 
 
-- 
2.36.0.rc0.470.gd361397f0d-goog

[PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

2022-04-12 Thread Chia-I Wu

In practice, trace_dma_fence_init called from dma_fence_init is good
enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
and virtio both have cases where trace_dma_fence_init and
trace_dma_fence_emit can be apart.  It is easier for visualization tools
to always use the more correct trace_dma_fence_emit when visualizing
fence timelines.

v2: improve commit message (Dmitry)

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/msm_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index faf0c242874e..a82193f41ea2 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Power Management:
@@ -769,6 +770,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
gpu->active_submits++;
mutex_unlock(>active_lock);
 
+   trace_dma_fence_emit(submit->hw_fence);
gpu->funcs->submit(gpu, submit);
gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
 
-- 
2.35.1.1178.g4f1659d476-goog

[PATCH v2 2/2] drm/sched: use __string in tracepoints

2022-04-12 Thread Chia-I Wu

Otherwise, ring names are marked [UNSAFE-MEMORY].

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
Reviewed-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/gpu_scheduler_trace.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h 
b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 188ae2105d57..3143ecaaff86 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -38,7 +38,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
TP_STRUCT__entry(
 __field(struct drm_sched_entity *, entity)
 __field(struct dma_fence *, fence)
-__field(const char *, name)
+__string(name, sched_job->sched->name)
 __field(uint64_t, id)
 __field(u32, job_count)
 __field(int, hw_job_count)
@@ -48,14 +48,14 @@ DECLARE_EVENT_CLASS(drm_sched_job,
   __entry->entity = entity;
   __entry->id = sched_job->id;
   __entry->fence = _job->s_fence->finished;
-  __entry->name = sched_job->sched->name;
+  __assign_str(name, sched_job->sched->name);
   __entry->job_count = 
spsc_queue_count(>job_queue);
   __entry->hw_job_count = atomic_read(
   _job->sched->hw_rq_count);
   ),
TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw 
job count:%d",
  __entry->entity, __entry->id,
- __entry->fence, __entry->name,
+ __entry->fence, __get_str(name),
  __entry->job_count, __entry->hw_job_count)
 );
 
@@ -86,7 +86,7 @@ TRACE_EVENT(drm_sched_job_wait_dep,
TP_PROTO(struct drm_sched_job *sched_job, struct dma_fence *fence),
TP_ARGS(sched_job, fence),
TP_STRUCT__entry(
-__field(const char *,name)
+__string(name, sched_job->sched->name)
 __field(uint64_t, id)
 __field(struct dma_fence *, fence)
 __field(uint64_t, ctx)
@@ -94,14 +94,14 @@ TRACE_EVENT(drm_sched_job_wait_dep,
 ),
 
TP_fast_assign(
-  __entry->name = sched_job->sched->name;
+  __assign_str(name, sched_job->sched->name);
   __entry->id = sched_job->id;
   __entry->fence = fence;
   __entry->ctx = fence->context;
   __entry->seqno = fence->seqno;
   ),
TP_printk("job ring=%s, id=%llu, depends fence=%p, context=%llu, 
seq=%u",
- __entry->name, __entry->id,
+ __get_str(name), __entry->id,
  __entry->fence, __entry->ctx,
  __entry->seqno)
 );
-- 
2.35.1.1178.g4f1659d476-goog

[PATCH v2 1/2] drm/sched: use DECLARE_EVENT_CLASS

2022-04-12 Thread Chia-I Wu

drm_sched_job and drm_run_job have the same prototype.

v2: rename the class from drm_sched_job_entity to drm_sched_job (Andrey)

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
Reviewed-by: Andrey Grodzovsky 
---
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 31 +--
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h 
b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 877ce9b127f1..188ae2105d57 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -32,7 +32,7 @@
 #define TRACE_SYSTEM gpu_scheduler
 #define TRACE_INCLUDE_FILE gpu_scheduler_trace
 
-TRACE_EVENT(drm_sched_job,
+DECLARE_EVENT_CLASS(drm_sched_job,
TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
TP_ARGS(sched_job, entity),
TP_STRUCT__entry(
@@ -59,31 +59,14 @@ TRACE_EVENT(drm_sched_job,
  __entry->job_count, __entry->hw_job_count)
 );
 
-TRACE_EVENT(drm_run_job,
+DEFINE_EVENT(drm_sched_job, drm_sched_job,
TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
-   TP_ARGS(sched_job, entity),
-   TP_STRUCT__entry(
-__field(struct drm_sched_entity *, entity)
-__field(struct dma_fence *, fence)
-__field(const char *, name)
-__field(uint64_t, id)
-__field(u32, job_count)
-__field(int, hw_job_count)
-),
+   TP_ARGS(sched_job, entity)
+);
 
-   TP_fast_assign(
-  __entry->entity = entity;
-  __entry->id = sched_job->id;
-  __entry->fence = _job->s_fence->finished;
-  __entry->name = sched_job->sched->name;
-  __entry->job_count = 
spsc_queue_count(>job_queue);
-  __entry->hw_job_count = atomic_read(
-  _job->sched->hw_rq_count);
-  ),
-   TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw 
job count:%d",
- __entry->entity, __entry->id,
- __entry->fence, __entry->name,
- __entry->job_count, __entry->hw_job_count)
+DEFINE_EVENT(drm_sched_job, drm_run_job,
+   TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
+   TP_ARGS(sched_job, entity)
 );
 
 TRACE_EVENT(drm_sched_process_job,
-- 
2.35.1.1178.g4f1659d476-goog

Re: [PATCH] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

2022-04-09 Thread Chia-I Wu

On Sat, Apr 9, 2022 at 7:33 AM Christian König  wrote:
>
> Am 08.04.22 um 23:12 schrieb Chia-I Wu:
> > In practice, trace_dma_fence_init is good enough and almost no driver
> > calls trace_dma_fence_emit.  But this is still more correct in theory.
>
> Well, the reason why basically no driver is calling this is because it
> is pretty much deprecated.
Why is it considered deprecated?  trace_dma_fence_{emit,signaled} are
useful to visualize fence timelines.  I am actually less sure about
how trace_dma_fence_{init,destroy} are used.

Is it because trace_dma_fence_init is called automatically, and is
good enough most of the time?

>
> We do have a case in the GPU scheduler where it makes sense to distinct
> between init and emit, but it doesn't really matter for drivers.
virtio also has a case where init and emit can be far apart, when the
host cannot process commands fast enough and there is no space in
virtqueue.

>
> So I'm not sure if it's a good idea to add that here.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Chia-I Wu 
> > Cc: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/msm_gpu.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index faf0c242874e..a82193f41ea2 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -15,6 +15,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >
> >   /*
> >* Power Management:
> > @@ -769,6 +770,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> >   gpu->active_submits++;
> >   mutex_unlock(>active_lock);
> >
> > + trace_dma_fence_emit(submit->hw_fence);
> >   gpu->funcs->submit(gpu, submit);
> >   gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
> >
>

[PATCH] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

2022-04-08 Thread Chia-I Wu

In practice, trace_dma_fence_init is good enough and almost no driver
calls trace_dma_fence_emit.  But this is still more correct in theory.

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index faf0c242874e..a82193f41ea2 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Power Management:
@@ -769,6 +770,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
gpu->active_submits++;
mutex_unlock(>active_lock);
 
+   trace_dma_fence_emit(submit->hw_fence);
gpu->funcs->submit(gpu, submit);
gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
 
-- 
2.35.1.1178.g4f1659d476-goog

[PATCH 2/2] drm/sched: use __string in tracepoints

2022-04-08 Thread Chia-I Wu

Otherwise, ring names are marked [UNSAFE-MEMORY].

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 drivers/gpu/drm/scheduler/gpu_scheduler_trace.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h 
b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 2e6eda920fe1..b5d99df65d93 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -38,7 +38,7 @@ DECLARE_EVENT_CLASS(drm_sched_job_entity,
TP_STRUCT__entry(
 __field(struct drm_sched_entity *, entity)
 __field(struct dma_fence *, fence)
-__field(const char *, name)
+__string(name, sched_job->sched->name)
 __field(uint64_t, id)
 __field(u32, job_count)
 __field(int, hw_job_count)
@@ -48,14 +48,14 @@ DECLARE_EVENT_CLASS(drm_sched_job_entity,
   __entry->entity = entity;
   __entry->id = sched_job->id;
   __entry->fence = _job->s_fence->finished;
-  __entry->name = sched_job->sched->name;
+  __assign_str(name, sched_job->sched->name);
   __entry->job_count = 
spsc_queue_count(>job_queue);
   __entry->hw_job_count = atomic_read(
   _job->sched->hw_rq_count);
   ),
TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw 
job count:%d",
  __entry->entity, __entry->id,
- __entry->fence, __entry->name,
+ __entry->fence, __get_str(name),
  __entry->job_count, __entry->hw_job_count)
 );
 
@@ -86,7 +86,7 @@ TRACE_EVENT(drm_sched_job_wait_dep,
TP_PROTO(struct drm_sched_job *sched_job, struct dma_fence *fence),
TP_ARGS(sched_job, fence),
TP_STRUCT__entry(
-__field(const char *,name)
+__string(name, sched_job->sched->name)
 __field(uint64_t, id)
 __field(struct dma_fence *, fence)
 __field(uint64_t, ctx)
@@ -94,14 +94,14 @@ TRACE_EVENT(drm_sched_job_wait_dep,
 ),
 
TP_fast_assign(
-  __entry->name = sched_job->sched->name;
+  __assign_str(name, sched_job->sched->name);
   __entry->id = sched_job->id;
   __entry->fence = fence;
   __entry->ctx = fence->context;
   __entry->seqno = fence->seqno;
   ),
TP_printk("job ring=%s, id=%llu, depends fence=%p, context=%llu, 
seq=%u",
- __entry->name, __entry->id,
+ __get_str(name), __entry->id,
  __entry->fence, __entry->ctx,
  __entry->seqno)
 );
-- 
2.35.1.1178.g4f1659d476-goog

[PATCH 1/2] drm/sched: use DECLARE_EVENT_CLASS

2022-04-08 Thread Chia-I Wu

drm_sched_job and drm_run_job have the same prototype.

Signed-off-by: Chia-I Wu 
Cc: Rob Clark 
---
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   | 31 +--
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h 
b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 877ce9b127f1..2e6eda920fe1 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -32,7 +32,7 @@
 #define TRACE_SYSTEM gpu_scheduler
 #define TRACE_INCLUDE_FILE gpu_scheduler_trace
 
-TRACE_EVENT(drm_sched_job,
+DECLARE_EVENT_CLASS(drm_sched_job_entity,
TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
TP_ARGS(sched_job, entity),
TP_STRUCT__entry(
@@ -59,31 +59,14 @@ TRACE_EVENT(drm_sched_job,
  __entry->job_count, __entry->hw_job_count)
 );
 
-TRACE_EVENT(drm_run_job,
+DEFINE_EVENT(drm_sched_job_entity, drm_sched_job,
TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
-   TP_ARGS(sched_job, entity),
-   TP_STRUCT__entry(
-__field(struct drm_sched_entity *, entity)
-__field(struct dma_fence *, fence)
-__field(const char *, name)
-__field(uint64_t, id)
-__field(u32, job_count)
-__field(int, hw_job_count)
-),
+   TP_ARGS(sched_job, entity)
+);
 
-   TP_fast_assign(
-  __entry->entity = entity;
-  __entry->id = sched_job->id;
-  __entry->fence = _job->s_fence->finished;
-  __entry->name = sched_job->sched->name;
-  __entry->job_count = 
spsc_queue_count(>job_queue);
-  __entry->hw_job_count = atomic_read(
-  _job->sched->hw_rq_count);
-  ),
-   TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw 
job count:%d",
- __entry->entity, __entry->id,
- __entry->fence, __entry->name,
- __entry->job_count, __entry->hw_job_count)
+DEFINE_EVENT(drm_sched_job_entity, drm_run_job,
+   TP_PROTO(struct drm_sched_job *sched_job, struct drm_sched_entity 
*entity),
+   TP_ARGS(sched_job, entity)
 );
 
 TRACE_EVENT(drm_sched_process_job,
-- 
2.35.1.1178.g4f1659d476-goog

Re: [PATCH] drm/virtio: Add execbuf flag to request no fence-event

2022-04-05 Thread Chia-I Wu

On Tue, Apr 5, 2022 at 10:38 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> It would have been cleaner to have a flag to *request* the fence event.
> But that ship has sailed.  So add a flag so that userspace which doesn't
> care about the events can opt-out.
>
> Signed-off-by: Rob Clark 
Reviewed-by: Chia-I Wu 

Might want to wait for Gurchetan to chime in as he added the mechanism.

> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 8 +---
>  include/uapi/drm/virtgpu_drm.h | 2 ++
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 3a8078f2ee27..09f1aa263f91 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -225,9 +225,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
> *dev, void *data,
> goto out_unresv;
> }
>
> -   ret = virtio_gpu_fence_event_create(dev, file, out_fence, ring_idx);
> -   if (ret)
> -   goto out_unresv;
> +   if (!(exbuf->flags & VIRTGPU_EXECBUF_NO_EVENT)) {
> +   ret = virtio_gpu_fence_event_create(dev, file, out_fence, 
> ring_idx);
> +   if (ret)
> +   goto out_unresv;
> +   }
>
> if (out_fence_fd >= 0) {
> sync_file = sync_file_create(_fence->f);
> diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
> index 0512fde5e697..d06cac3407cc 100644
> --- a/include/uapi/drm/virtgpu_drm.h
> +++ b/include/uapi/drm/virtgpu_drm.h
> @@ -52,10 +52,12 @@ extern "C" {
>  #define VIRTGPU_EXECBUF_FENCE_FD_IN0x01
>  #define VIRTGPU_EXECBUF_FENCE_FD_OUT   0x02
>  #define VIRTGPU_EXECBUF_RING_IDX   0x04
> +#define VIRTGPU_EXECBUF_NO_EVENT   0x08
>  #define VIRTGPU_EXECBUF_FLAGS  (\
> VIRTGPU_EXECBUF_FENCE_FD_IN |\
> VIRTGPU_EXECBUF_FENCE_FD_OUT |\
> VIRTGPU_EXECBUF_RING_IDX |\
> +   VIRTGPU_EXECBUF_NO_EVENT |\
> 0)
>
>  struct drm_virtgpu_map {
> --
> 2.35.1
>

Re: [PATCH] virtio-gpu: fix a missing check to avoid NULL dereference

2022-03-28 Thread Chia-I Wu

On Sat, Mar 26, 2022 at 10:09 PM Xiaomeng Tong  wrote:
>
> 'cache_ent' could be set NULL inside virtio_gpu_cmd_get_capset()
> and it will lead to a NULL dereference by a lately use of it
> (i.e., ptr = cache_ent->caps_cache). Fix it with a NULL check.
>
> Fixes: 62fb7a5e10962 ("virtio-gpu: add 3d/virgl support")
> Signed-off-by: Xiaomeng Tong 
Reviewed-by: Chia-I Wu 
> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index c708bab555c6..b0f1c4d8fd23 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -579,8 +579,10 @@ static int virtio_gpu_get_caps_ioctl(struct drm_device 
> *dev,
> spin_unlock(>display_info_lock);
>
> /* not in cache - need to talk to hw */
> -   virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
> +   ret = virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
>   _ent);
> +   if (ret)
> +   return ret;
> virtio_gpu_notify(vgdev);
>
>  copy_exit:
>
> base-commit: f443e374ae131c168a065ea1748feac6b2e76613
> --
> 2.17.1
>

Re: [PATCH] drm/virtio: fix NULL pointer dereference in virtio_gpu_conn_get_modes

2022-03-24 Thread Chia-I Wu

On Wed, Mar 23, 2022 at 4:01 AM Liu Zixian  wrote:
> diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c 
> b/drivers/gpu/drm/virtio/virtgpu_display.c
> index 5b00310ac..f73352e7b 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_display.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_display.c
> @@ -179,6 +179,8 @@ static int virtio_gpu_conn_get_modes(struct drm_connector 
> *connector)
> DRM_DEBUG("add mode: %dx%d\n", width, height);
> mode = drm_cvt_mode(connector->dev, width, height, 60,
> false, false, false);
> +   if (!mode)
> +   return count;
> mode->type |= DRM_MODE_TYPE_PREFERRED;
> drm_mode_probed_add(connector, mode);
> count++;
Can we avoid early return here?  Something like

  mode = drm_cvt_mode(...);
  if (mode) {
DRM_DEBUG("add mode: %dx%d\n", width, height);
mode->type |= DRM_MODE_TYPE_PREFERRED
drm_mode_probed_add(connector, mode);
count++;
  }

is more future proof.

> --
> 2.33.0
>

Re: [PATCH] drm/virtio: Add USE_INTERNAL blob flag

2022-02-22 Thread Chia-I Wu

On Fri, Feb 18, 2022 at 9:51 AM Rob Clark  wrote:
>
> On Fri, Feb 18, 2022 at 8:42 AM Chia-I Wu  wrote:
> >
> > On Fri, Feb 18, 2022 at 7:57 AM Rob Clark  wrote:
> > >
> > > From: Rob Clark 
> > >
> > > With native userspace drivers in guest, a lot of GEM objects need to be
> > > neither shared nor mappable.  And in fact making everything mappable
> > > and/or sharable results in unreasonably high fd usage in host VMM.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > > This is for a thing I'm working on, a new virtgpu context type that
> > > allows for running native userspace driver in the guest, with a
> > > thin shim in the host VMM.  In this case, the guest has a lot of
> > > GEM buffer objects which need to be neither shared nor mappable.
> > >
> > > Alternative idea is to just drop the restriction that blob_flags
> > > be non-zero.  I'm ok with either approach.
> > Dropping the restriction sounds better to me.
> >
> > What is the use case for such a resource?  Does the host need to know
> > such a resource exists?
>
> There are a bunch of use cases, some internal (like visibility stream
> buffers filled during binning pass and consumed during draw pass),
> some external (tiled and/or UBWC buffers are never accessed on the
> CPU).
For these use cases, it's true that userspace might want internal bos,
and serialize them as res_ids which the host maps to host gem_handles.
But userspace can also skip the internal bos and encode host
gem_handles directly.

But the kernel probably should not dictate what the userspace should
do by requiring non-zero blob flags.

>
> In theory, at least currently, drm/virtgpu does not need to know about
> them, but there are a lot of places in userspace that expect to have a
> gem handle.  Longer term, I think I want to extend virtgpu with
> MADVISE ioctl so we can track DONTNEED state in guest and only release
> buffers when host and/or guest is under memory pressure.  For that we
> will defn need guest side gem handles
MADVICE is a hint that userspace sets and is not based on memory
pressure.  It is the receivers of the hint who take actions when under
memory pressure.  I think it can be between the guest userspace and
the host?


>
> BR,
> -R
>
> > >
> > >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 7 ++-
> > >  include/uapi/drm/virtgpu_drm.h | 1 +
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> > > b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > index 69f1952f3144..92e1ba6b8078 100644
> > > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > @@ -36,7 +36,8 @@
> > >
> > >  #define VIRTGPU_BLOB_FLAG_USE_MASK (VIRTGPU_BLOB_FLAG_USE_MAPPABLE | \
> > > VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
> > > -   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
> > > +   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE | \
> > > +   VIRTGPU_BLOB_FLAG_USE_INTERNAL)
> > >
> > >  static int virtio_gpu_fence_event_create(struct drm_device *dev,
> > >  struct drm_file *file,
> > > @@ -662,6 +663,10 @@ static int verify_blob(struct virtio_gpu_device 
> > > *vgdev,
> > > params->size = rc_blob->size;
> > > params->blob = true;
> > > params->blob_flags = rc_blob->blob_flags;
> > > +
> > > +   /* USE_INTERNAL is local to guest kernel, don't past to host: */
> > > +   params->blob_flags &= ~VIRTGPU_BLOB_FLAG_USE_INTERNAL;
> > > +
> > > return 0;
> > >  }
> > >
> > > diff --git a/include/uapi/drm/virtgpu_drm.h 
> > > b/include/uapi/drm/virtgpu_drm.h
> > > index 0512fde5e697..62b7483e5c60 100644
> > > --- a/include/uapi/drm/virtgpu_drm.h
> > > +++ b/include/uapi/drm/virtgpu_drm.h
> > > @@ -163,6 +163,7 @@ struct drm_virtgpu_resource_create_blob {
> > >  #define VIRTGPU_BLOB_FLAG_USE_MAPPABLE 0x0001
> > >  #define VIRTGPU_BLOB_FLAG_USE_SHAREABLE0x0002
> > >  #define VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE 0x0004
> > > +#define VIRTGPU_BLOB_FLAG_USE_INTERNAL 0x0008   /* not-mappable, 
> > > not-shareable */
> > > /* zero is invalid blob_mem */
> > > __u32 blob_mem;
> > > __u32 blob_flags;
> > > --
> > > 2.34.1
> > >

Re: [PATCH] drm/virtio: Remove restriction of non-zero blob_flags

2022-02-22 Thread Chia-I Wu

On Sat, Feb 19, 2022 at 9:02 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> With native userspace drivers in guest, a lot of GEM objects need to be
> neither shared nor mappable.  And in fact making everything mappable
> and/or sharable results in unreasonably high fd usage in host VMM.
>
> Signed-off-by: Rob Clark 
Reviewed-by: Chia-I Wu

Re: [PATCH] drm/virtio: Add USE_INTERNAL blob flag

2022-02-18 Thread Chia-I Wu

On Fri, Feb 18, 2022 at 7:57 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> With native userspace drivers in guest, a lot of GEM objects need to be
> neither shared nor mappable.  And in fact making everything mappable
> and/or sharable results in unreasonably high fd usage in host VMM.
>
> Signed-off-by: Rob Clark 
> ---
> This is for a thing I'm working on, a new virtgpu context type that
> allows for running native userspace driver in the guest, with a
> thin shim in the host VMM.  In this case, the guest has a lot of
> GEM buffer objects which need to be neither shared nor mappable.
>
> Alternative idea is to just drop the restriction that blob_flags
> be non-zero.  I'm ok with either approach.
Dropping the restriction sounds better to me.

What is the use case for such a resource?  Does the host need to know
such a resource exists?

>
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 7 ++-
>  include/uapi/drm/virtgpu_drm.h | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 69f1952f3144..92e1ba6b8078 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -36,7 +36,8 @@
>
>  #define VIRTGPU_BLOB_FLAG_USE_MASK (VIRTGPU_BLOB_FLAG_USE_MAPPABLE | \
> VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
> -   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
> +   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE | \
> +   VIRTGPU_BLOB_FLAG_USE_INTERNAL)
>
>  static int virtio_gpu_fence_event_create(struct drm_device *dev,
>  struct drm_file *file,
> @@ -662,6 +663,10 @@ static int verify_blob(struct virtio_gpu_device *vgdev,
> params->size = rc_blob->size;
> params->blob = true;
> params->blob_flags = rc_blob->blob_flags;
> +
> +   /* USE_INTERNAL is local to guest kernel, don't past to host: */
> +   params->blob_flags &= ~VIRTGPU_BLOB_FLAG_USE_INTERNAL;
> +
> return 0;
>  }
>
> diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h
> index 0512fde5e697..62b7483e5c60 100644
> --- a/include/uapi/drm/virtgpu_drm.h
> +++ b/include/uapi/drm/virtgpu_drm.h
> @@ -163,6 +163,7 @@ struct drm_virtgpu_resource_create_blob {
>  #define VIRTGPU_BLOB_FLAG_USE_MAPPABLE 0x0001
>  #define VIRTGPU_BLOB_FLAG_USE_SHAREABLE0x0002
>  #define VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE 0x0004
> +#define VIRTGPU_BLOB_FLAG_USE_INTERNAL 0x0008   /* not-mappable, 
> not-shareable */
> /* zero is invalid blob_mem */
> __u32 blob_mem;
> __u32 blob_flags;
> --
> 2.34.1
>

Re: [PATCH v2 0/2] drm/gem-shmem: Various improvements

2022-02-10 Thread Chia-I Wu

On Wed, Feb 9, 2022 at 7:56 AM Thomas Zimmermann  wrote:
>
> Two patches for GEM's SHMEM-backed implementation.
>
> v2:
> * update drivers after vm_ops change
>
> Thomas Zimmermann (2):
>   drm/gem-shmem: Set vm_ops in static initializer
>   drm/gem-shmem: Don't store mmap'ed buffers in core dumps
>
>  drivers/gpu/drm/drm_gem_shmem_helper.c  | 7 ---
>  drivers/gpu/drm/lima/lima_gem.c | 1 +
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 1 +
>  drivers/gpu/drm/v3d/v3d_bo.c| 1 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 1 +
>  include/drm/drm_gem_shmem_helper.h  | 2 ++
>  6 files changed, 10 insertions(+), 3 deletions(-)
Reviewed-by: Chia-I Wu 


>
> --
> 2.34.1
>

Re: [PATCH v2] drm/virtio: Fix an NULL vs IS_ERR() bug in virtio_gpu_object_shmem_init()

2021-11-23 Thread Chia-I Wu

On Thu, Nov 18, 2021 at 3:16 AM Dan Carpenter  wrote:
>
> The drm_gem_shmem_get_sg_table() function never returns NULL.  It returns
> error pointers on error.
>
> Fixes: c66df701e783 ("drm/virtio: switch from ttm to gem shmem helpers")
> Signed-off-by: Dan Carpenter 
> ---
> v2: I originally sent this patch on 19 Jun 2020 but it was somehow
> not applied.  As I review it now, I see that the bug is actually
> older than I originally thought and so I have updated the Fixes
> tag.
Reviewed-by: Chia-I Wu

Re: [PATCH] drm/virtio: delay pinning the pages till first use

2021-11-02 Thread Chia-I Wu

On Tue, Nov 2, 2021 at 6:07 AM Gerd Hoffmann  wrote:
>
> On Tue, Nov 02, 2021 at 12:31:39PM +0100, Maksym Wezdecki wrote:
> > From: mwezdeck 
> >
> > The idea behind the commit:
> >   1. not pin the pages during resource_create ioctl
> >   2. pin the pages on the first use during:
> >   - transfer_*_host ioctl
> > - map ioctl
>
> i.e. basically lazy pinning.  Approach looks sane to me.
>
> >   3. introduce new ioctl for pinning pages on demand
>
> What is the use case for this ioctl?
> In any case this should be a separate patch.

Lazy pinning can be a nice optimization that userspace does not
necessarily need to know about.  This patch however skips pinning for
execbuffer ioctl and introduces a new pin ioctl instead.  That is a
red flag.

[PATCH] MAINTAINERS: add reviewers for virtio-gpu

2021-10-28 Thread Chia-I Wu

Add Gurchetan Singh and me as reviewers for virtio-gpu.

Signed-off-by: Chia-I Wu 
Acked-by: Gurchetan Singh 
Cc: David Airlie 
Cc: Gerd Hoffmann 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3b79fd441dde..5474a0a708a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19894,6 +19894,8 @@ F:  include/uapi/linux/virtio_gpio.h
 VIRTIO GPU DRIVER
 M: David Airlie 
 M: Gerd Hoffmann 
+R: Gurchetan Singh 
+R: Chia-I Wu 
 L: dri-devel@lists.freedesktop.org
 L: virtualizat...@lists.linux-foundation.org
 S: Maintained
-- 
2.33.0.1079.g6e70778dc9-goog

Re: drm/virtio: not pin pages on demand

2021-10-28 Thread Chia-I Wu

On Wed, Oct 27, 2021 at 4:12 AM Gerd Hoffmann  wrote:
>
> [ Cc'ing Gurchetan Singh ]
>
> > Can we follow up on this issue?
> >
> > The main pain point with your suggestion is the fact,
> > that it will cause VirGL protocol breakage and we would
> > like to avoid this.
> >
> > Extending execbuffer ioctl and create_resource ioctl is
> > more convenient than having the protocol broken.
>
> Do you know at resource creation time whenever you need cpu access
> or not?  IOW can we make that a resource property, so we don't have
> pass around lists of objects on each and every execbuffer ioctl?
Yes.  The userspace driver should be able to internally mark the
resource as NO TRANSFER and omit it from execbuffer.  It can be tricky
though because resource wait will no longer work.


>
> > Blob resources is not a solution, too, because QEMU doesn't
> > support blob resources right now.
> >
> > In ideal solution when both QEMU and crosvm support blob resources
> > we can switch to blob resources for textures.
>
> That'll only happen if someone works on it.
> I will not be able to do that though.
>
> > I would like to introduce changes gradually and not make a protocol
> > breakage.
>
> Well, opengl switching to blob resources is a protocol change anyway.
> That doesn't imply things will break though.  We have capsets etc to
> extend the protocol while maintaining backward compatibility.
>
> > What do you think about that?
>
> I still think that switching to blob resources would be the better
> solution.  Yes, it's alot of work and not something which helps
> short-term.  But adding a new API as interim solution isn't great
> either.  So, not sure.  Chia-I Wu?  Gurchetan Singh?
I can agree with that, although it depends on how much work needs to
happen in the userspace for virgl.

We will look into a userspace-only solution, or at least something not
involving uAPI additions.

>
>
> While being at it:  Chia-I Wu & Gurchetan Singh, could you help
> reviewing virtio-gpu kernel patches?  I think you both have a better
> view on the big picture (with both mesa and virglrenderer) than I have.
> Also for the crosvm side of things.  The procedure for that would be to
> send a patch adding yourself to the virtio-gpu section of the
> MAINTAINERS file, so scripts/get_maintainer.pl will Cc you on patches
> submitted.
Sure.  Will do.
>
> thanks,
>   Gerd
>
> >
> > Maksym
> >
> >
> > On 10/22/21 10:44 AM, Maksym Wezdecki wrote:
> >
> > > Once again with all lists and receivers. I'm sorry for that.
> > >
> > > On 10/21/21 6:42 PM, Chia-I Wu wrote:
> > >> On Thu, Oct 21, 2021 at 4:52 AM Gerd Hoffmann  wrote:
> > >>> On Thu, Oct 21, 2021 at 11:55:47AM +0200, Maksym Wezdecki wrote:
> > >>>> I get your point. However, we need to make resource_create ioctl,
> > >>>> in order to create corresponding resource on the host.
> > >>> That used to be the case but isn't true any more with the new
> > >>> blob resources.  virglrenderer allows to create gpu objects
> > >>> via execbuffer.  Those gpu objects can be linked to a
> > >>> virtio-gpu resources, but it's optional.  In your case you
> > >>> would do that only for your staging buffer.  The other textures
> > >>> (which you fill with a host-side copy from the staging buffer)
> > >>> do not need a virtio-gpu resource in the first place.
> > >> That's however a breaking change to the virgl protocol.  All virgl
> > >> commands expect res ids rather than blob ids.
> > >>
> > >> Using VIRTGPU_BLOB_MEM_HOST3D could in theory work.  But there are a
> > >> few occasions where virglrenderer expects there to be guest storage.
> > >> There are also readbacks that we need to support.  We might end up
> > >> using VIRTGPU_BLOB_MEM_HOST3D_GUEST, where pin-on-demand is still
> > >> desirable.
> > >>
> > >> For this patch, I think the uapi change can be simplified.  It can be
> > >> a new param plus a new field in drm_virtgpu_execbuffer
> > >>
> > >>   __u64 bo_flags; /* pointer to an array of size num_bo_handles, or NULL 
> > >> */
> > >>
> > >> The other changes do not seem needed.
> > > My first approach was the same, as you mentioned. However, having "__u64 
> > > bo_flags"
> > > needs a for loop, where only few of the bo-s needs to be pinned - this has
> > > performance implications. I would rather pass bo handles that should be 
> > > pinned than
> > > having a lot of flags, where only 1-2 bos needs to be pinned.
> > >
> > >>> take care,
> > >>>   Gerd
> > >>>
>
> --
>

Re: drm/virtio: not pin pages on demand

2021-10-21 Thread Chia-I Wu

On Thu, Oct 21, 2021 at 4:52 AM Gerd Hoffmann  wrote:
>
> On Thu, Oct 21, 2021 at 11:55:47AM +0200, Maksym Wezdecki wrote:
> > I get your point. However, we need to make resource_create ioctl,
> > in order to create corresponding resource on the host.
>
> That used to be the case but isn't true any more with the new
> blob resources.  virglrenderer allows to create gpu objects
> via execbuffer.  Those gpu objects can be linked to a
> virtio-gpu resources, but it's optional.  In your case you
> would do that only for your staging buffer.  The other textures
> (which you fill with a host-side copy from the staging buffer)
> do not need a virtio-gpu resource in the first place.
That's however a breaking change to the virgl protocol.  All virgl
commands expect res ids rather than blob ids.

Using VIRTGPU_BLOB_MEM_HOST3D could in theory work.  But there are a
few occasions where virglrenderer expects there to be guest storage.
There are also readbacks that we need to support.  We might end up
using VIRTGPU_BLOB_MEM_HOST3D_GUEST, where pin-on-demand is still
desirable.

For this patch, I think the uapi change can be simplified.  It can be
a new param plus a new field in drm_virtgpu_execbuffer

  __u64 bo_flags; /* pointer to an array of size num_bo_handles, or NULL */

The other changes do not seem needed.

>
> take care,
>   Gerd
>

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-15 Thread Chia-I Wu

 i

On Tue, Sep 14, 2021 at 6:26 PM Gurchetan Singh
 wrote:
>
>
>
> On Tue, Sep 14, 2021 at 10:53 AM Chia-I Wu  wrote:
>>
>> ,On Mon, Sep 13, 2021 at 6:57 PM Gurchetan Singh
>>  wrote:
>> >
>> >
>> >
>> >
>> > On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu  wrote:
>> >>
>> >> .
>> >>
>> >> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
>> >>  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:
>> >> >>
>> >> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
>> >> >>  wrote:
>> >> >> >
>> >> >> > We don't want fences from different 3D contexts (virgl, gfxstream,
>> >> >> > venus) to be on the same timeline.  With explicit context creation,
>> >> >> > we can specify the number of ring each context wants.
>> >> >> >
>> >> >> > Execbuffer can specify which ring to use.
>> >> >> >
>> >> >> > Signed-off-by: Gurchetan Singh 
>> >> >> > Acked-by: Lingfeng Yang 
>> >> >> > ---
>> >> >> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
>> >> >> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 
>> >> >> > --
>> >> >> >  2 files changed, 35 insertions(+), 2 deletions(-)
>> >> >> >
>> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
>> >> >> > b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> >> > index a5142d60c2fa..cca9ab505deb 100644
>> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> >> > @@ -56,6 +56,7 @@
>> >> >> >  #define STATE_ERR 2
>> >> >> >
>> >> >> >  #define MAX_CAPSET_ID 63
>> >> >> > +#define MAX_RINGS 64
>> >> >> >
>> >> >> >  struct virtio_gpu_object_params {
>> >> >> > unsigned long size;
>> >> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
>> >> >> > uint32_t ctx_id;
>> >> >> > uint32_t context_init;
>> >> >> > bool context_created;
>> >> >> > +   uint32_t num_rings;
>> >> >> > +   uint64_t base_fence_ctx;
>> >> >> > struct mutex context_lock;
>> >> >> >  };
>> >> >> >
>> >> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
>> >> >> > b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> >> > index f51f3393a194..262f79210283 100644
>> >> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> >> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> >> >> > drm_device *dev, void *data,
>> >> >> > int in_fence_fd = exbuf->fence_fd;
>> >> >> > int out_fence_fd = -1;
>> >> >> > void *buf;
>> >> >> > +   uint64_t fence_ctx;
>> >> >> > +   uint32_t ring_idx;
>> >> >> > +
>> >> >> > +   fence_ctx = vgdev->fence_drv.context;
>> >> >> > +   ring_idx = 0;
>> >> >> >
>> >> >> > if (vgdev->has_virgl_3d == false)
>> >> >> > return -ENOSYS;
>> >> >> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> >> >> > drm_device *dev, void *data,
>> >> >> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
>> >> >> > return -EINVAL;
>> >> >> >
>> >> >> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
>> >> >> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
>> >> >> > +   return -EINVAL;
>> >> >> > +
>> >> >> > +   if (!vfpriv->base_fence_ctx)
>> >> >

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-14 Thread Chia-I Wu

,On Mon, Sep 13, 2021 at 6:57 PM Gurchetan Singh
 wrote:
>
>
>
>
> On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu  wrote:
>>
>> .
>>
>> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
>>  wrote:
>> >
>> >
>> >
>> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:
>> >>
>> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
>> >>  wrote:
>> >> >
>> >> > We don't want fences from different 3D contexts (virgl, gfxstream,
>> >> > venus) to be on the same timeline.  With explicit context creation,
>> >> > we can specify the number of ring each context wants.
>> >> >
>> >> > Execbuffer can specify which ring to use.
>> >> >
>> >> > Signed-off-by: Gurchetan Singh 
>> >> > Acked-by: Lingfeng Yang 
>> >> > ---
>> >> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
>> >> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
>> >> >  2 files changed, 35 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
>> >> > b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > index a5142d60c2fa..cca9ab505deb 100644
>> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > @@ -56,6 +56,7 @@
>> >> >  #define STATE_ERR 2
>> >> >
>> >> >  #define MAX_CAPSET_ID 63
>> >> > +#define MAX_RINGS 64
>> >> >
>> >> >  struct virtio_gpu_object_params {
>> >> > unsigned long size;
>> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
>> >> > uint32_t ctx_id;
>> >> > uint32_t context_init;
>> >> > bool context_created;
>> >> > +   uint32_t num_rings;
>> >> > +   uint64_t base_fence_ctx;
>> >> > struct mutex context_lock;
>> >> >  };
>> >> >
>> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
>> >> > b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > index f51f3393a194..262f79210283 100644
>> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> >> > drm_device *dev, void *data,
>> >> > int in_fence_fd = exbuf->fence_fd;
>> >> > int out_fence_fd = -1;
>> >> > void *buf;
>> >> > +   uint64_t fence_ctx;
>> >> > +   uint32_t ring_idx;
>> >> > +
>> >> > +   fence_ctx = vgdev->fence_drv.context;
>> >> > +   ring_idx = 0;
>> >> >
>> >> > if (vgdev->has_virgl_3d == false)
>> >> > return -ENOSYS;
>> >> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> >> > drm_device *dev, void *data,
>> >> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
>> >> > return -EINVAL;
>> >> >
>> >> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
>> >> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
>> >> > +   return -EINVAL;
>> >> > +
>> >> > +   if (!vfpriv->base_fence_ctx)
>> >> > +   return -EINVAL;
>> >> > +
>> >> > +   fence_ctx = vfpriv->base_fence_ctx;
>> >> > +   ring_idx = exbuf->ring_idx;
>> >> > +   }
>> >> > +
>> >> > exbuf->fence_fd = -1;
>> >> >
>> >> > virtio_gpu_create_context(dev, file);
>> >> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> >> > drm_device *dev, void *data,
>> >> > goto out_memdup;
>> >> > }
>> >> >
>> >> > -   out_fence = virtio_gpu_fence_alloc(vgdev, 
>> >> > vgdev->fence_drv.context, 0);
>> >> > +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
>> >> > if(!out_fence) {

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-13 Thread Chia-I Wu

.

On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
 wrote:
>
>
>
> On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu  wrote:
>>
>> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
>>  wrote:
>> >
>> > We don't want fences from different 3D contexts (virgl, gfxstream,
>> > venus) to be on the same timeline.  With explicit context creation,
>> > we can specify the number of ring each context wants.
>> >
>> > Execbuffer can specify which ring to use.
>> >
>> > Signed-off-by: Gurchetan Singh 
>> > Acked-by: Lingfeng Yang 
>> > ---
>> >  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
>> >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
>> >  2 files changed, 35 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
>> > b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> > index a5142d60c2fa..cca9ab505deb 100644
>> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> > @@ -56,6 +56,7 @@
>> >  #define STATE_ERR 2
>> >
>> >  #define MAX_CAPSET_ID 63
>> > +#define MAX_RINGS 64
>> >
>> >  struct virtio_gpu_object_params {
>> > unsigned long size;
>> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
>> > uint32_t ctx_id;
>> > uint32_t context_init;
>> > bool context_created;
>> > +   uint32_t num_rings;
>> > +   uint64_t base_fence_ctx;
>> > struct mutex context_lock;
>> >  };
>> >
>> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
>> > b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> > index f51f3393a194..262f79210283 100644
>> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> > drm_device *dev, void *data,
>> > int in_fence_fd = exbuf->fence_fd;
>> > int out_fence_fd = -1;
>> > void *buf;
>> > +   uint64_t fence_ctx;
>> > +   uint32_t ring_idx;
>> > +
>> > +   fence_ctx = vgdev->fence_drv.context;
>> > +   ring_idx = 0;
>> >
>> > if (vgdev->has_virgl_3d == false)
>> > return -ENOSYS;
>> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> > drm_device *dev, void *data,
>> > if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
>> > return -EINVAL;
>> >
>> > +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
>> > +   if (exbuf->ring_idx >= vfpriv->num_rings)
>> > +   return -EINVAL;
>> > +
>> > +   if (!vfpriv->base_fence_ctx)
>> > +   return -EINVAL;
>> > +
>> > +   fence_ctx = vfpriv->base_fence_ctx;
>> > +   ring_idx = exbuf->ring_idx;
>> > +   }
>> > +
>> > exbuf->fence_fd = -1;
>> >
>> > virtio_gpu_create_context(dev, file);
>> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct 
>> > drm_device *dev, void *data,
>> > goto out_memdup;
>> > }
>> >
>> > -   out_fence = virtio_gpu_fence_alloc(vgdev, 
>> > vgdev->fence_drv.context, 0);
>> > +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
>> > if(!out_fence) {
>> > ret = -ENOMEM;
>> > goto out_unresv;
>> > @@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct 
>> > drm_device *dev,
>> > return -EINVAL;
>> >
>> > /* Number of unique parameters supported at this time. */
>> > -   if (num_params > 1)
>> > +   if (num_params > 2)
>> > return -EINVAL;
>> >
>> > ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
>> > @@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct 
>> > drm_device *dev,
>> >
>> > vfpriv->context_init |= value;
>> > break;
>> > +   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
>> > +   if (vfpriv->base_fence_ctx) {
&g

Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

2021-09-10 Thread Chia-I Wu

On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
 wrote:
>
> We don't want fences from different 3D contexts (virgl, gfxstream,
> venus) to be on the same timeline.  With explicit context creation,
> we can specify the number of ring each context wants.
>
> Execbuffer can specify which ring to use.
>
> Signed-off-by: Gurchetan Singh 
> Acked-by: Lingfeng Yang 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 +++
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 --
>  2 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index a5142d60c2fa..cca9ab505deb 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -56,6 +56,7 @@
>  #define STATE_ERR 2
>
>  #define MAX_CAPSET_ID 63
> +#define MAX_RINGS 64
>
>  struct virtio_gpu_object_params {
> unsigned long size;
> @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
> uint32_t ctx_id;
> uint32_t context_init;
> bool context_created;
> +   uint32_t num_rings;
> +   uint64_t base_fence_ctx;
> struct mutex context_lock;
>  };
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index f51f3393a194..262f79210283 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
> *dev, void *data,
> int in_fence_fd = exbuf->fence_fd;
> int out_fence_fd = -1;
> void *buf;
> +   uint64_t fence_ctx;
> +   uint32_t ring_idx;
> +
> +   fence_ctx = vgdev->fence_drv.context;
> +   ring_idx = 0;
>
> if (vgdev->has_virgl_3d == false)
> return -ENOSYS;
> @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
> *dev, void *data,
> if ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
> return -EINVAL;
>
> +   if ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
> +   if (exbuf->ring_idx >= vfpriv->num_rings)
> +   return -EINVAL;
> +
> +   if (!vfpriv->base_fence_ctx)
> +   return -EINVAL;
> +
> +   fence_ctx = vfpriv->base_fence_ctx;
> +   ring_idx = exbuf->ring_idx;
> +   }
> +
> exbuf->fence_fd = -1;
>
> virtio_gpu_create_context(dev, file);
> @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
> *dev, void *data,
> goto out_memdup;
> }
>
> -   out_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 
> 0);
> +   out_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
> if(!out_fence) {
> ret = -ENOMEM;
> goto out_unresv;
> @@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct 
> drm_device *dev,
> return -EINVAL;
>
> /* Number of unique parameters supported at this time. */
> -   if (num_params > 1)
> +   if (num_params > 2)
> return -EINVAL;
>
> ctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
> @@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct 
> drm_device *dev,
>
> vfpriv->context_init |= value;
> break;
> +   case VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
> +   if (vfpriv->base_fence_ctx) {
> +   ret = -EINVAL;
> +   goto out_unlock;
> +   }
> +
> +   if (value > MAX_RINGS) {
> +   ret = -EINVAL;
> +   goto out_unlock;
> +   }
> +
> +   vfpriv->base_fence_ctx = 
> dma_fence_context_alloc(value);
With multiple fence contexts, we should do something about implicit fencing.

The classic example is Mesa and X server.  When both use virgl and the
global fence context, no dma_fence_wait is fine.  But when Mesa uses
venus and the ring fence context, dma_fence_wait should be inserted.


> +   vfpriv->num_rings = value;
> +   break;
> default:
> ret = -EINVAL;
> goto out_unlock;
> --
> 2.33.0.153.gba50c8fa24-goog
>
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>

Re: [PATCH 0/7] dma-buf: Add an API for exporting sync files (v11)

2021-06-10 Thread Chia-I Wu

On Tue, May 25, 2021 at 2:18 PM Jason Ekstrand  wrote:
> Modern userspace APIs like Vulkan are built on an explicit
> synchronization model.  This doesn't always play nicely with the
> implicit synchronization used in the kernel and assumed by X11 and
> Wayland.  The client -> compositor half of the synchronization isn't too
> bad, at least on intel, because we can control whether or not i915
> synchronizes on the buffer and whether or not it's considered written.
We might have an important use case for this half, for virtio-gpu and Chrome OS.

When the guest compositor acts as a proxy to connect guest apps to the
host compositor, implicit fencing requires the guest compositor to do
a wait before forwarding the buffer to the host compositor.  With this
patch, the guest compositor can extract the dma-fence from the buffer,
and if the fence is a virtio-gpu fence, forward both the fence and the
buffer to the host compositor.  It will allow us to convert a
guest-side wait into a host-side wait.

[PATCH] drm/virtio: fix prime export for vram objects

2021-01-07 Thread Chia-I Wu

commit 16845c5d5409 ("drm/virtio: implement blob resources: implement
vram object") and commit c6069a02fa55 ("drm/virtgpu: Set PRIME export
function in struct drm_gem_object_funcs") landed from different trees,
resulting in prime export never working for vram objects.

Cc: Gurchetan Singh 
Cc: Thomas Zimmermann 
Cc: Gerd Hoffmann 
Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_vram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vram.c 
b/drivers/gpu/drm/virtio/virtgpu_vram.c
index d6f215c4ff8d..5cc34e7330fa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vram.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vram.c
@@ -69,6 +69,7 @@ static const struct drm_gem_object_funcs 
virtio_gpu_vram_funcs = {
.close = virtio_gpu_gem_object_close,
.free = virtio_gpu_vram_free,
.mmap = virtio_gpu_vram_mmap,
+   .export = virtgpu_gem_prime_export,
 };
 
 bool virtio_gpu_is_vram(struct virtio_gpu_object *bo)
-- 
2.29.2.729.g45daf8777d-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm/virtio: make sure context is created in gem open

2021-01-07 Thread Chia-I Wu

The context might still be missing when DRM_IOCTL_PRIME_FD_TO_HANDLE is
the first ioctl on the drm_file.

Fixes: 72b48ae800da ("drm/virtio: enqueue virtio_gpu_create_context after the 
first 3D ioctl")
Cc: Gurchetan Singh 
Cc: Gerd Hoffmann 
Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_gem.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
b/drivers/gpu/drm/virtio/virtgpu_gem.c
index c30c75ee83fc..8502400b2f9c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -39,9 +39,6 @@ static int virtio_gpu_gem_create(struct drm_file *file,
int ret;
u32 handle;
 
-   if (vgdev->has_virgl_3d)
-   virtio_gpu_create_context(dev, file);
-
ret = virtio_gpu_object_create(vgdev, params, , NULL);
if (ret < 0)
return ret;
@@ -119,6 +116,11 @@ int virtio_gpu_gem_object_open(struct drm_gem_object *obj,
if (!vgdev->has_virgl_3d)
goto out_notify;
 
+   /* the context might still be missing when the first ioctl is
+* DRM_IOCTL_MODE_CREATE_DUMB or DRM_IOCTL_PRIME_FD_TO_HANDLE
+*/
+   virtio_gpu_create_context(obj->dev, file);
+
objs = virtio_gpu_array_alloc(1);
if (!objs)
return -ENOMEM;
-- 
2.29.2.729.g45daf8777d-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm/virtio: align blob resources to page sizes

2020-12-18 Thread Chia-I Wu

They trigger the BUG_ON() in drm_gem_private_object_init otherwise.

Signed-off-by: Chia-I Wu 
Cc: Gurchetan Singh 
Cc: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_vram.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vram.c 
b/drivers/gpu/drm/virtio/virtgpu_vram.c
index 23c21bc4d01e..d6f215c4ff8d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vram.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vram.c
@@ -134,6 +134,8 @@ int virtio_gpu_vram_create(struct virtio_gpu_device *vgdev,
 
obj = >base.base.base;
obj->funcs = _gpu_vram_funcs;
+
+   params->size = PAGE_ALIGN(params->size);
drm_gem_private_object_init(vgdev->ddev, obj, params->size);
 
/* Create fake offset */
-- 
2.29.2.684.gfbc64c5ab5-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 22/23] drm/virtio: implement blob resources: resource create blob ioctl

2020-09-03 Thread Chia-I Wu

On Wed, Sep 2, 2020 at 2:09 PM Gurchetan Singh
 wrote:
>
> From: Gerd Hoffmann 
>
> Implement resource create blob as specified.
>
> Signed-off-by: Gerd Hoffmann 
> Co-developed-by: Gurchetan Singh 
> Signed-off-by: Gurchetan Singh 
> Acked-by: Tomeu Vizoso 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h|   4 +-
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c  | 136 
>  drivers/gpu/drm/virtio/virtgpu_object.c |   5 +-
>  drivers/gpu/drm/virtio/virtgpu_vram.c   |   2 +
>  4 files changed, 144 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 6162865c162df..d2ea199dbdb90 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -257,8 +257,8 @@ struct virtio_gpu_fpriv {
> struct mutex context_lock;
>  };
>
> -/* virtgpu_ioctl.c */
> -#define DRM_VIRTIO_NUM_IOCTLS 10
> +/* virtio_ioctl.c */
> +#define DRM_VIRTIO_NUM_IOCTLS 11
>  extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
>  void virtio_gpu_create_context(struct drm_device *dev, struct drm_file 
> *file);
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 7dbe24248a200..442cbca59c8a5 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -34,6 +34,10 @@
>
>  #include "virtgpu_drv.h"
>
> +#define VIRTGPU_BLOB_FLAG_USE_MASK (VIRTGPU_BLOB_FLAG_USE_MAPPABLE | \
> +   VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
> +   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
> +
>  void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file)
>  {
> struct virtio_gpu_device *vgdev = dev->dev_private;
> @@ -520,6 +524,134 @@ static int virtio_gpu_get_caps_ioctl(struct drm_device 
> *dev,
> return 0;
>  }
>
> +static int verify_blob(struct virtio_gpu_device *vgdev,
> +  struct virtio_gpu_fpriv *vfpriv,
> +  struct virtio_gpu_object_params *params,
> +  struct drm_virtgpu_resource_create_blob *rc_blob,
> +  bool *guest_blob, bool *host3d_blob)
> +{
> +   if (!vgdev->has_resource_blob)
> +   return -EINVAL;
> +
> +   if ((rc_blob->blob_flags & ~VIRTGPU_BLOB_FLAG_USE_MASK) ||
> +   !rc_blob->blob_flags)
> +   return -EINVAL;
> +
> +   if (rc_blob->blob_flags & VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE) {
> +   if (!vgdev->has_resource_assign_uuid)
> +   return -EINVAL;
> +   }
> +
> +   switch (rc_blob->blob_mem) {
> +   case VIRTGPU_BLOB_MEM_GUEST:
> +   *guest_blob = true;
> +   break;
> +   case VIRTGPU_BLOB_MEM_HOST3D_GUEST:
> +   *guest_blob = true;
> +   fallthrough;
> +   case VIRTGPU_BLOB_MEM_HOST3D:
> +   *host3d_blob = true;
> +   break;
> +   default:
> +   return -EINVAL;
> +   }
> +
> +   if (*host3d_blob) {
> +   if (!vgdev->has_virgl_3d)
> +   return -EINVAL;
> +
> +   /* Must be dword aligned. */
> +   if (rc_blob->cmd_size % 4 != 0)
> +   return -EINVAL;
> +
> +   params->ctx_id = vfpriv->ctx_id;
> +   params->blob_id = rc_blob->blob_id;
> +   } else {
> +   if (rc_blob->blob_id != 0)
> +   return -EINVAL;
> +
> +   if (rc_blob->cmd_size != 0)
> +   return -EINVAL;
> +   }
> +
> +   params->blob_mem = rc_blob->blob_mem;
> +   params->size = rc_blob->size;
> +   params->blob = true;
> +   params->blob_flags = rc_blob->blob_flags;
> +   return 0;
> +}
> +
> +static int virtio_gpu_resource_create_blob(struct drm_device *dev,
> +  void *data, struct drm_file *file)
> +{
> +   int ret = 0;
> +   uint32_t handle = 0;
> +   bool guest_blob = false;
> +   bool host3d_blob = false;
> +   struct drm_gem_object *obj;
> +   struct virtio_gpu_object *bo;
> +   struct virtio_gpu_object_params params = { 0 };
> +   struct virtio_gpu_device *vgdev = dev->dev_private;
> +   struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> +   struct drm_virtgpu_resource_create_blob *rc_blob = data;
> +
> +   if (verify_blob(vgdev, vfpriv, , rc_blob,
> +   _blob, _blob))
> +   return -EINVAL;
> +
> +   if (vgdev->has_virgl_3d)
> +   virtio_gpu_create_context(dev, file);
> +
> +   if (rc_blob->cmd_size) {
> +   void *buf;
> +
> +   buf = memdup_user(u64_to_user_ptr(rc_blob->cmd),
> + rc_blob->cmd_size);
> +
> +   if (IS_ERR(buf))
> +   return PTR_ERR(buf);
> +
>

Re: pages pinned for BO lifetime and security

2020-07-22 Thread Chia-I Wu

On Wed, Jul 22, 2020 at 4:28 AM Daniel Vetter  wrote:
>
> On Wed, Jul 22, 2020 at 1:12 PM Christian König
>  wrote:
> >
> > Am 22.07.20 um 09:32 schrieb Daniel Vetter:
> > > On Wed, Jul 22, 2020 at 9:19 AM Christian König
> > >  wrote:
> > >> Am 22.07.20 um 02:22 schrieb Gurchetan Singh:
> > >>
> > >> +Christian who added DMABUF_MOVE_NOTIFY which added the relevant blurb:
> > >>
> > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fdma-buf%2FKconfig%23n46data=02%7C01%7Cchristian.koenig%40amd.com%7C916f6a9b64884e21fd7f08d82e115c8b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63731122752271sdata=WKs3KSr3K1DdVmDaIZ%2FnV8VEBPWMGMSGweeay0HIOxw%3Dreserved=0
> > >>
> > >> Currently, the user seems to amdgpu for P2P dma-buf and it seems to 
> > >> plumb ttm (*move_notify) callback to dma-buf.  We're not sure if it's a 
> > >> security issue occurring across DRM drivers, or one more specific to the 
> > >> new amdgpu use case.
> > >>
> > >> On Tue, Jul 21, 2020 at 1:03 PM Chia-I Wu  wrote:
> > >>> Hi list,
> > >>>
> > >>> virtio-gpu is moving in the direction where BO pages are pinned for
> > >>> the lifetime for simplicity.  I am wondering if that is considered a
> > >>> security issue in general, especially aCan you elaborate a little bit 
> > >>> what these other problems might be?  Memory fragmentation?fter running 
> > >>> into the
> > >>> description of the new DMABUF_MOVE_NOTIFY config option.
> > >>
> > >> Yes, that is generally considered a deny of service possibility and so 
> > >> far Dave and Daniel have rejected all tries to upstream stuff like this 
> > >> as far as I know.
> > > Uh we have merged pretty much all arm-soc drivers without real
> > > shrinkers. Whether that was a good idea or not is maybe different
> > > question - now that we do have pretty good helpers maybe we should
> > > poke this a bit more. But then SoCs Suck (tm).
> >
> > I was under the impression that those SoC drivers still use the GEM
> > helpers which unpinns stuff when it is not in use. But I might be wrong.
>
> It's kinda mostly there, even some helpers for shrinking but a)
> helpers on, not all drivers use it b) for purgeable objects only, not
> generally for inactive stuff - there's no active use tracking c) cma
> helpers (ok that one is only for vc4 as the render driver) don't even
> have that. I had some slow burner series to get us towards dma_resv
> locking in shmem helpers and then maybe even a common shrinker helper
> with some "actually kick it out now" callback, but yeah never got
> there.
My quick survey of the SoC drivers also told me that they tend to
demonstrate a) or b).

About b), I was thinking maybe that's because the systems the drivers
run on are historically swap-less.  There is no place to write the
dirty pages back to and thus less incentive to support shrinking
inactive objects.

You both mentioned that the lack of swap is irrelevant (or at least
not the only factor).  Can you elaborate a little bit on that?
Shrinking inactive objects returns the pages to swap cache... hmm, I
guess that helps memory defragmentation?

>
> So maybe per-device object shrinker helper would be something neat we
> could lift out of ttm (when it's happening), maybe with a simple
> callback somewhere in it's lru tracking. Probably best if the shrinker
> lru is outright separate from anything else or it just gets messy.
> -Daniel
>
> > > But for real gpus they do indeed all have shrinkers, and not just "pin
> > > everything forever" model. Real gpus = stuff you might run on servers
> > > or multi-app and all that stuff, not with a simple "we just kill all
> > > background jobs if memory gets low" model like on android and other
> > > such things.
> > >
> > >> DMA-buf an pinning for scanout are the only exceptions since the 
> > >> implementation wouldn't have been possible otherwise.
> > >>
> > >>> Most drivers do not have a shrinker, or whether a BO is purgeable is
> > >>> entirely controlled by the userspace (madvice).  They can be
> > >>> categorized as "a security problem where userspace is able to pin
> > >>> unrestricted amounts of memory".  But those drivers are normally found
> > >>> on systems without swap.  I don't think the issue applies.

pages pinned for BO lifetime and security

2020-07-21 Thread Chia-I Wu

Hi list,

virtio-gpu is moving in the direction where BO pages are pinned for
the lifetime for simplicity. I am wondering if that is considered a
security issue in general, especially after running into the
description of the new DMABUF_MOVE_NOTIFY config option.

Most drivers do not have a shrinker, or whether a BO is purgeable is
entirely controlled by the userspace (madvice). They can be
categorized as "a security problem where userspace is able to pin
unrestricted amounts of memory". But those drivers are normally found
on systems without swap. I don't think the issue applies.

Of the desktop GPU drivers, i915's shrinker certainly supports purging
to swap. TTM is a bit hard to follow. I can't really tell if amdgpu
or nouveau supports that. virtio-gpu is more commonly found on
systems with swaps so I think it should follow the desktop practices?

Truth is, the emulated virtio-gpu device always supports page moves
with VIRTIO_GPU_CMD_RESOURCE_{ATTACH,DETACH}_BACKING. It is just that
the driver does not make use of them. That makes this less of an
issue because the driver can be fixed anytime (finger crossed that the
emulator won't have bugs in these untested paths). This issue becomes
more urgent because we are considering adding a new HW command[1]
where page moves will be disallowed. We definitely don't want a HW
command that is inherently insecure, if BO pages pinned for the
lifetime is considered a security issue on desktops.

[1] VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
https://gitlab.freedesktop.org/virgl/drm-misc-next/-/blob/virtio-gpu-next/include/uapi/linux/virtio_gpu.h#L396
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/virtio: delete notify in virtio_gpu_object_create

2020-03-25 Thread Chia-I Wu

On Thu, Mar 26, 2020 at 7:10 AM Gurchetan Singh
 wrote:
>
> For 3D buffers, virtio_gpu_gem_object_open notifies.
> We can have the same behavior for dumb buffer.
>
> v2: virtio_gpu_gem_object_open always notifies
> v3: avoid boolean variable
Series is

Reviewed-by: Chia-I Wu 

>
> Signed-off-by: Gurchetan Singh 
> ---
>  drivers/gpu/drm/virtio/virtgpu_gem.c| 3 ++-
>  drivers/gpu/drm/virtio/virtgpu_object.c | 1 -
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
> b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 90c0a8ea1708c..1025658be4df2 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -114,7 +114,7 @@ int virtio_gpu_gem_object_open(struct drm_gem_object *obj,
> struct virtio_gpu_object_array *objs;
>
> if (!vgdev->has_virgl_3d)
> -   return 0;
> +   goto out_notify;
>
> objs = virtio_gpu_array_alloc(1);
> if (!objs)
> @@ -123,6 +123,7 @@ int virtio_gpu_gem_object_open(struct drm_gem_object *obj,
>
> virtio_gpu_cmd_context_attach_resource(vgdev, vfpriv->ctx_id,
>objs);
> +out_notify:
> virtio_gpu_notify(vgdev);
> return 0;
>  }
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index d9039bb7c5e37..51a8da7d5ef3b 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -241,7 +241,6 @@ int virtio_gpu_object_create(struct virtio_gpu_device 
> *vgdev,
> return ret;
> }
>
> -   virtio_gpu_notify(vgdev);
> *bo_ptr = bo;
> return 0;
>
> --
> 2.24.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/virtio: delete notify in virtio_gpu_object_create

2020-03-25 Thread Chia-I Wu

On Wed, Mar 25, 2020 at 8:41 AM Gurchetan Singh
 wrote:
>
> For 3D buffers, virtio_gpu_gem_object_open notifies.
> We can have the same behavior for dumb buffer.  We just
> need to make sure the first open notifies the host for
> dumb buffers.
virtio_gpu_notify is cheap and does not kick unless there is a need.
I probably won't bother with adding `bool notified', which adds a
(harmless) data race.




>
> v2: virtio_gpu_gem_object_open always notifies
>
> Signed-off-by: Gurchetan Singh 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h|  1 +
>  drivers/gpu/drm/virtio/virtgpu_gem.c| 10 --
>  drivers/gpu/drm/virtio/virtgpu_object.c |  1 -
>  3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 79ad176aca5a8..842200e01d785 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -71,6 +71,7 @@ struct virtio_gpu_object {
> uint32_t hw_res_handle;
> bool dumb;
> bool created;
> +   bool notified;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
> container_of((gobj), struct virtio_gpu_object, base.base)
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
> b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 90c0a8ea1708c..597ddb7391fb9 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -111,10 +111,13 @@ int virtio_gpu_gem_object_open(struct drm_gem_object 
> *obj,
>  {
> struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> +   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> struct virtio_gpu_object_array *objs;
>
> -   if (!vgdev->has_virgl_3d)
> -   return 0;
> +   if (!vgdev->has_virgl_3d && !bo->notified)
> +   goto out_notify;
> +   else if (!vgdev->has_virgl_3d)
> +   goto out;
>
> objs = virtio_gpu_array_alloc(1);
> if (!objs)
> @@ -123,7 +126,10 @@ int virtio_gpu_gem_object_open(struct drm_gem_object 
> *obj,
>
> virtio_gpu_cmd_context_attach_resource(vgdev, vfpriv->ctx_id,
>objs);
> +out_notify:
> +   bo->notified = true;
> virtio_gpu_notify(vgdev);
> +out:
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index d9039bb7c5e37..51a8da7d5ef3b 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -241,7 +241,6 @@ int virtio_gpu_object_create(struct virtio_gpu_device 
> *vgdev,
> return ret;
> }
>
> -   virtio_gpu_notify(vgdev);
> *bo_ptr = bo;
> return 0;
>
> --
> 2.24.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] drm/virtio: delete notify in virtio_gpu_object_create

2020-03-24 Thread Chia-I Wu

On Wed, Mar 25, 2020 at 12:44 AM Gurchetan Singh
 wrote:
>
> - For dumb buffers, virtio_gpu_gem_create can notify.
> - For 3D buffers, virtio_gpu_gem_object_open can notify.
Hmm, I feel this is a bit complex.  virtio_gpu_gem_object_open may not
notify, and the caller needs to know about it.

Can we change it to virtio_gpu_object_create never notifies and
virtio_gpu_gem_object_open always notifies?

>
> Signed-off-by: Gurchetan Singh 
> ---
>  drivers/gpu/drm/virtio/virtgpu_gem.c| 1 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 1 -
>  2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
> b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 90c0a8ea1708..aa14dd12928e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -49,6 +49,7 @@ static int virtio_gpu_gem_create(struct drm_file *file,
> return ret;
> }
>
> +   virtio_gpu_notify(vgdev);
> *obj_p = >base.base;
>
> /* drop reference from allocate - handle holds it now */
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index d9039bb7c5e3..51a8da7d5ef3 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -241,7 +241,6 @@ int virtio_gpu_object_create(struct virtio_gpu_device 
> *vgdev,
> return ret;
> }
>
> -   virtio_gpu_notify(vgdev);
> *bo_ptr = bo;
> return 0;
>
> --
> 2.25.1.696.g5e7596f4ac-goog
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/8] * Per context fencing *

2020-03-17 Thread Chia-I Wu

On Mon, Mar 16, 2020 at 3:44 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > >> At virtio level it is pretty simple:  The host completes the SUBMIT_3D
> > >> virtio command when it finished rendering, period.
> > >>
> > >>
> > >> On the guest side we don't need the fence_id.  The completion callback
> > >> gets passed the virtio_gpu_vbuffer, so it can figure which command did
> > >> actually complete without looking at virtio_gpu_ctrl_hdr->fence_id.
> > >>
> > >> On the host side we depend on the fence_id right now, but only because
> > >> that is the way the virgl_renderer_callbacks->write_fence interface is
> > >> designed.  We have to change that anyway for per-context (or whatever)
> > >> fences, so it should not be a problem to drop the fence_id dependency
> > >> too and just pass around an opaque pointer instead.
> >
> > I am still catching up, but IIUC, indeed I don't think the host needs
> > to depend on fence_id.  We should be able to repurpose fence_id.
>
> I'd rather ignore it altogether for FENCE_V2 (or whatever we call the
> feature flag).

That's fine when one assumes each virtqueue has one host GPU timeline.
But when there are multiple (e.g., multiplexing multiple contexts over
one virtqueue, or multiple VkQueues), fence_id can be repurposed as
the host timeline id.

>
> > On the other hand, the VIRTIO_GPU_FLAG_FENCE flag is interesting, and
> > it indicates that the vbuf is on the host GPU timeline instead of the
> > host CPU timeline.
>
> Yep, we have to keep that (unless we do command completion on GPU
> timeline unconditionally with FENCE_V2).

I think it will be useful when EXECBUFFER is used for metadata query
and write the metadata directly to a guest BO's sg list.  We want the
query to be on the CPU timeline.




> cheers,
>   Gerd
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/8] * Per context fencing *

2020-03-13 Thread Chia-I Wu

On Thu, Mar 12, 2020 at 4:08 PM Gurchetan Singh
 wrote:
>
>
>
> On Thu, Mar 12, 2020 at 2:29 AM Gerd Hoffmann  wrote:
>>
>> On Wed, Mar 11, 2020 at 04:36:16PM -0700, Gurchetan Singh wrote:
>> > On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann  wrote:
>> >
>> > >   Hi,
>> > >
>> > > > I should've been more clear -- this is an internal cleanup/preparation
>> > > and
>> > > > the per-context changes are invisible to host userspace.
>> > >
>> > > Ok, it wasn't clear that you don't flip the switch yet.  In general the
>> > > commit messages could be a bit more verbose ...
>> > >
>> > > I'm wondering though why we need the new fence_id in the first place.
>> > > Isn't it enough to have per-context (instead of global) last_seq?
>> > >
>> >
>> > Heh, that was to leave open the possibility of multiple timelines per
>> > context.  Roughly speaking,
>> >
>> > V2 -- multiple processes
>> > V3 -- multiple processes and multiple threads (due to VK multi-threaded
>> > command buffers)
>> >
>> > I think we all agree on V2.  It seems we still have to discuss V3
>> > (multi-queue, thread pools, a fence context associated with each thread) a
>> > bit more before we start landing pieces.
>>
>> While thinking about the whole thing a bit more ...
>> Do we need virtio_gpu_ctrl_hdr->fence_id at all?
>
>
> A fence ID could be useful for sharing fences across virtio devices.  Think 
> FENCE_ASSIGN_UUID, akin to  RESOURCE_ASSIGN_UUID (+dstevens@).
>
>>
>> At virtio level it is pretty simple:  The host completes the SUBMIT_3D
>> virtio command when it finished rendering, period.
>>
>>
>> On the guest side we don't need the fence_id.  The completion callback
>> gets passed the virtio_gpu_vbuffer, so it can figure which command did
>> actually complete without looking at virtio_gpu_ctrl_hdr->fence_id.
>>
>> On the host side we depend on the fence_id right now, but only because
>> that is the way the virgl_renderer_callbacks->write_fence interface is
>> designed.  We have to change that anyway for per-context (or whatever)
>> fences, so it should not be a problem to drop the fence_id dependency
>> too and just pass around an opaque pointer instead.

I am still catching up, but IIUC, indeed I don't think the host needs
to depend on fence_id.  We should be able to repurpose fence_id.  On
the other hand, the VIRTIO_GPU_FLAG_FENCE flag is interesting, and it
indicates that the vbuf is on the host GPU timeline instead of the
host CPU timeline.

>
>
> For multiple GPU timelines per context, the (process local) sync object 
> handle looks interesting:
>
> https://patchwork.kernel.org/patch/9758565/
>
> Some have extended EXECBUFFER to support this flow:
>
> https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstr...@intel.com

I think this only affects the kernel/userspace interface?  I know
there were works being done to support VK_KHR_timeline semaphore,
which is something we definitely want.  I don't know if it is the only
way for the userspace to gain the extension support.  I need to do my
homework...



>
>>
>> cheers,
>>   Gerd
>>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/8] * Per context fencing *

2020-03-11 Thread Chia-I Wu

On Wed, Mar 11, 2020 at 4:36 PM Gurchetan Singh
 wrote:
>
>
>
> On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann  wrote:
>>
>>   Hi,
>>
>> > I should've been more clear -- this is an internal cleanup/preparation and
>> > the per-context changes are invisible to host userspace.
>>
>> Ok, it wasn't clear that you don't flip the switch yet.  In general the
>> commit messages could be a bit more verbose ...
>>
>> I'm wondering though why we need the new fence_id in the first place.
>> Isn't it enough to have per-context (instead of global) last_seq?
>
>
> Heh, that was to leave open the possibility of multiple timelines per 
> context.  Roughly speaking,
Yeah, I think we will need multiple timelines per context.

>
> V2 -- multiple processes
> V3 -- multiple processes and multiple threads (due to VK multi-threaded 
> command buffers)
>
> I think we all agree on V2.  It seems we still have to discuss V3 
> (multi-queue, thread pools, a fence context associated with each thread) a 
> bit more before we start landing pieces.
In addition to multiple threads, we should also consider multiple VkQueues.

I will start with... how many timelines do we want to expose per
context?  In my mind, it goes like

V1: 1 timeline per virtqueue (there is one timeline for ctrlq right now)
V2: 1 timeline per context (VK and GL on different timelines)
V3: N timelines per context (each VkQueue in a VK context gets a timeline?)
V4: N+M timelines per context (each CPU thread also gets a timeline?!?!)

I certainly don't know if V4 is a good idea or not...



>
>>
>> > Multi-queue sounds very interesting indeed, especially with VK
>> > multi-threaded command submission.  That to me is V3 rather than V2.. let's
>> > start easy!
>>
>> Having v2 if we plan to obsolete it with v3 soon doesn't look like a
>> good plan to me.  It'll make backward compatibility more complex for
>> no good reason ...
>>
>> Also: Does virglrenderer render different contexts in parallel today?
>> Only in case it does we'll actually get benefits from per-context
>> fences.  But I think it doesn't, so there is no need to rush.
>>
>> I think we should better have a rough plan for parallel rendering first,
>> then go start implementing the pieces needed.
>>
>> cheers,
>>   Gerd
>>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/8] * Per context fencing *

2020-03-11 Thread Chia-I Wu

On Wed, Mar 11, 2020 at 3:36 AM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > I should've been more clear -- this is an internal cleanup/preparation and
> > the per-context changes are invisible to host userspace.
>
> Ok, it wasn't clear that you don't flip the switch yet.  In general the
> commit messages could be a bit more verbose ...
>
> I'm wondering though why we need the new fence_id in the first place.
> Isn't it enough to have per-context (instead of global) last_seq?
>
> > Multi-queue sounds very interesting indeed, especially with VK
> > multi-threaded command submission.  That to me is V3 rather than V2.. let's
> > start easy!
>
> Having v2 if we plan to obsolete it with v3 soon doesn't look like a
> good plan to me.  It'll make backward compatibility more complex for
> no good reason ...
I agree we want to study multi-queue a little bit before doing v2.  If
we do decide that multi-queue will be v3, we should at least design v2
in a forward-compatible way.

Every VK context (or GL context if we go multi-process GL) is
isolated.  I think there will need to be at least one virtqueue for
each VK context.  Can virtqueues be added dynamically?  Or can we have
fixed but enough (e.g., 64) virtqueues?

Multi-threaded command submission is not helped by multi-queue unless
we go with one virtqueue for each VKQueue in a VK context.  Otherwise,
multi-queue only makes context scheduling easier, which is not a
priority yet IMO.


>
> Also: Does virglrenderer render different contexts in parallel today?
> Only in case it does we'll actually get benefits from per-context
> fences.  But I think it doesn't, so there is no need to rush.
>
> I think we should better have a rough plan for parallel rendering first,
> then go start implementing the pieces needed.
It will be soon.  Each VK context will be rendered by a different
renderer process.  Besides, VK contexts and GL contexts are not on the
same timeline.  We don't want one to delay another by presenting a
unified timeline.


>
> cheers,
>   Gerd
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/2] [RFC] drm/virtgpu: modify uapi with stride/layer_stride fix

2020-02-28 Thread Chia-I Wu

On Wed, Oct 2, 2019 at 5:18 PM Gurchetan Singh
 wrote:
>
> On Wed, Oct 2, 2019 at 1:49 AM Gerd Hoffmann  wrote:
> >
> > On Tue, Oct 01, 2019 at 06:49:35PM -0700, Gurchetan Singh wrote:
> > > This doesn't really break userspace, since it always passes down
> > > 0 for stride/layer_stride currently. We could:
> > >
> > > (1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
> >
> > This I think.
> > But IMO it's not a fix, it is an added feature ...
> >
> > Also missing the big picture here.  Why do we need this?
>
> Two reasons:
I don't fully get the picture, but drm_virtgpu_resource_create has
stride.  Can we send that down in transfers?  It lacks layer_stride
though...
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5 1/3] drm/shmem: add support for per object caching flags.

2020-02-26 Thread Chia-I Wu

On Wed, Feb 26, 2020 at 10:25 AM Thomas Hellström (VMware)
 wrote:
>
> Hi, Gerd,
>
> While looking at this patchset I came across some stuff that seems
> strange but that was merged in a previous patchset.
>
> (please refer to
> https://lists.freedesktop.org/archives/dri-devel/2018-September/190001.html.
> Forgive me if I've missed any discussion leading up to this).
>
>
> On 2/26/20 4:47 PM, Gerd Hoffmann wrote:
> > Add map_cached bool to drm_gem_shmem_object, to request cached mappings
> > on a per-object base.  Check the flag before adding writecombine to
> > pgprot bits.
> >
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Gerd Hoffmann 
> > ---
> >   include/drm/drm_gem_shmem_helper.h |  5 +
> >   drivers/gpu/drm/drm_gem_shmem_helper.c | 15 +++
> >   2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/drm/drm_gem_shmem_helper.h 
> > b/include/drm/drm_gem_shmem_helper.h
> > index e34a7b7f848a..294b2931c4cc 100644
> > --- a/include/drm/drm_gem_shmem_helper.h
> > +++ b/include/drm/drm_gem_shmem_helper.h
> > @@ -96,6 +96,11 @@ struct drm_gem_shmem_object {
> >* The address are un-mapped when the count reaches zero.
> >*/
> >   unsigned int vmap_use_count;
> > +
> > + /**
> > +  * @map_cached: map object cached (instead of using writecombine).
> > +  */
> > + bool map_cached;
> >   };
> >
> >   #define to_drm_gem_shmem_obj(obj) \
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> > b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index a421a2eed48a..aad9324dcf4f 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -254,11 +254,16 @@ static void *drm_gem_shmem_vmap_locked(struct 
> > drm_gem_shmem_object *shmem)
> >   if (ret)
> >   goto err_zero_use;
> >
> > - if (obj->import_attach)
> > + if (obj->import_attach) {
> >   shmem->vaddr = dma_buf_vmap(obj->import_attach->dmabuf);
> > - else
> > + } else {
> > + pgprot_t prot = PAGE_KERNEL;
> > +
> > + if (!shmem->map_cached)
> > + prot = pgprot_writecombine(prot);
> >   shmem->vaddr = vmap(shmem->pages, obj->size >> PAGE_SHIFT,
> > - VM_MAP, pgprot_writecombine(PAGE_KERNEL));
> > + VM_MAP, prot)
>
>
> Wouldn't a vmap with pgprot_writecombine() create conflicting mappings
> with the linear kernel map which is not write-combined? Or do you change
> the linear kernel map of the shmem pages somewhere? vmap bypassess at
> least the x86 PAT core mapping consistency check and this could
> potentially cause spuriously overwritten memory.

Yeah, I think this creates a conflicting alias.  It seems a call to
set_pages_array_wc here or changes elsewhere is needed..

But this is a pre-existing issue in the shmem helper.  There is also
no universal fix (e.g., set_pages_array_wc is x86 only)?  I would hope
this series can be merged sooner to fix the regression first.

>
>
> > + }
> >
> >   if (!shmem->vaddr) {
> >   DRM_DEBUG_KMS("Failed to vmap pages\n");
> > @@ -540,7 +545,9 @@ int drm_gem_shmem_mmap(struct drm_gem_object *obj, 
> > struct vm_area_struct *vma)
> >   }
> >
> >   vma->vm_flags |= VM_MIXEDMAP | VM_DONTEXPAND;
> > - vma->vm_page_prot = 
> > pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> > + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> > + if (!shmem->map_cached)
> > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>
> Same thing here. Note that vmf_insert_page() which is used by the fault
> handler also bypasses the x86 PAT  consistency check, whereas
> vmf_insert_mixed() doesn't.
>
> >   vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
>
> At least with SME or SEV encryption, where shmem memory has its kernel
> map set to encrypted, creating conflicting mappings is explicitly
> disallowed.
> BTW, why is mmap mapping decrypted while vmap isn't?
>
> >   vma->vm_ops = _gem_shmem_vm_ops;
> >
>
> Thanks,
> Thomas
>
>
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 4/4 v6] drm/virtio: enqueue virtio_gpu_create_context after the first 3D ioctl

2020-02-24 Thread Chia-I Wu

On Mon, Feb 24, 2020 at 5:24 AM Emil Velikov  wrote:
>
> On Mon, 24 Feb 2020 at 11:06, Gerd Hoffmann  wrote:
> >
> > On Fri, Feb 21, 2020 at 04:54:02PM -0800, Gurchetan Singh wrote:
> > > On Fri, Feb 21, 2020 at 3:06 PM Chia-I Wu  wrote:
> > > >
> > > > On Wed, Feb 19, 2020 at 2:34 PM Gurchetan Singh
> > > >  wrote:
> > > > >
> > > > > For old userspace, initialization will still be implicit.
> > > > >
> > > > > For backwards compatibility, enqueue virtio_gpu_cmd_context_create 
> > > > > after
> > > > > the first 3D ioctl.
> > > > >
> > > > > v3: staticify virtio_gpu_create_context
> > > > > remove notify to batch vm-exit
> > > > > v6: Remove nested 3D checks (emil.velikov):
> > > > >   - unify 3D check in resource create
> > > > >   - VIRTIO_GPU_CMD_GET_CAPSET is listed as a 2D ioctl, added a
> > > > > 3D check there.
> > > > I missed this change.  Why do we need a context to get capsets?  Is
> > > > this a workaround or something?
> > >
> > > No, don't need a context to get a capset.  The patch tries to create a
> > > context when a 3D userspace first talks to the host, not when a 3D
> > > userspace first needs a context ID.  If the latter is preferred, we
> > > can do that too.
> >
> > I think it makes sense to exclude the capset ioctls here.
> >
> > Possibly they are used for non-3d-related capabilities at some
> > point in the future.
> >
> > Also userspace gets capsets while checking device capabilities
> > and possibly does so before deciding how to drive the device.
> >
> Virglrenderer creates the internal/base GL* context during
> virgl_renderer_init().
> Based upon which the caps are populated.
>
> Personally I don't have a preference for/against dropping the
> virtio_gpu_create_context().
> Yet it does seems safe to omit.
Yeah, it should be safe to omit.  The userspace might want to decide
the "context type" based on the capsets.  It should also be better to
omit.


>
> HTH
> Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 4/4 v6] drm/virtio: enqueue virtio_gpu_create_context after the first 3D ioctl

On Wed, Feb 19, 2020 at 2:34 PM Gurchetan Singh
 wrote:
>
> For old userspace, initialization will still be implicit.
>
> For backwards compatibility, enqueue virtio_gpu_cmd_context_create after
> the first 3D ioctl.
>
> v3: staticify virtio_gpu_create_context
> remove notify to batch vm-exit
> v6: Remove nested 3D checks (emil.velikov):
>   - unify 3D check in resource create
>   - VIRTIO_GPU_CMD_GET_CAPSET is listed as a 2D ioctl, added a
> 3D check there.
I missed this change.  Why do we need a context to get capsets?  Is
this a workaround or something?

>
> Reviewed-by: Chia-I Wu 
> Reviewed-by: Emil Velikov 
> Signed-off-by: Gurchetan Singh 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |  2 --
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 32 +++---
>  drivers/gpu/drm/virtio/virtgpu_kms.c   |  1 -
>  3 files changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 76b7b7c30e10..95a7443baaba 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -216,8 +216,6 @@ struct virtio_gpu_fpriv {
>  /* virtio_ioctl.c */
>  #define DRM_VIRTIO_NUM_IOCTLS 10
>  extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
> -void virtio_gpu_create_context(struct drm_device *dev,
> -  struct drm_file *file);
>
>  /* virtio_kms.c */
>  int virtio_gpu_init(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index ec38cf5573aa..c36faa572caa 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -33,8 +33,8 @@
>
>  #include "virtgpu_drv.h"
>
> -void virtio_gpu_create_context(struct drm_device *dev,
> -  struct drm_file *file)
> +static void virtio_gpu_create_context(struct drm_device *dev,
> + struct drm_file *file)
>  {
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> @@ -95,6 +95,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
> *dev, void *data,
>
> exbuf->fence_fd = -1;
>
> +   virtio_gpu_create_context(dev, file);
> if (exbuf->flags & VIRTGPU_EXECBUF_FENCE_FD_IN) {
> struct dma_fence *in_fence;
>
> @@ -233,7 +234,17 @@ static int virtio_gpu_resource_create_ioctl(struct 
> drm_device *dev, void *data,
> uint32_t handle = 0;
> struct virtio_gpu_object_params params = { 0 };
>
> -   if (vgdev->has_virgl_3d == false) {
> +   if (vgdev->has_virgl_3d) {
> +   virtio_gpu_create_context(dev, file);
> +   params.virgl = true;
> +   params.target = rc->target;
> +   params.bind = rc->bind;
> +   params.depth = rc->depth;
> +   params.array_size = rc->array_size;
> +   params.last_level = rc->last_level;
> +   params.nr_samples = rc->nr_samples;
> +   params.flags = rc->flags;
> +   } else {
> if (rc->depth > 1)
> return -EINVAL;
> if (rc->nr_samples > 1)
> @@ -250,16 +261,6 @@ static int virtio_gpu_resource_create_ioctl(struct 
> drm_device *dev, void *data,
> params.width = rc->width;
> params.height = rc->height;
> params.size = rc->size;
> -   if (vgdev->has_virgl_3d) {
> -   params.virgl = true;
> -   params.target = rc->target;
> -   params.bind = rc->bind;
> -   params.depth = rc->depth;
> -   params.array_size = rc->array_size;
> -   params.last_level = rc->last_level;
> -   params.nr_samples = rc->nr_samples;
> -   params.flags = rc->flags;
> -   }
> /* allocate a single page size object */
> if (params.size == 0)
> params.size = PAGE_SIZE;
> @@ -319,6 +320,7 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
> drm_device *dev,
> if (vgdev->has_virgl_3d == false)
> return -ENOSYS;
>
> +   virtio_gpu_create_context(dev, file);
> objs = virtio_gpu_array_from_handles(file, >bo_handle, 1);
> if (objs == NULL)
> return -ENOENT;
> @@ -367,6 +369,7 @@ static int virtio_gpu_transfer_to_host_ioctl(struct 
> drm_device *dev, void *data,
>

Re: [Bug] virtio-gpu broken with qemu/kvm on arm64 on kernel 5.5+

On Fri, Feb 21, 2020 at 2:06 AM Guillaume Gardet
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Chia-I Wu 
> > Sent: 20 February 2020 19:41
> > To: Guillaume Gardet 
> > Cc: dri-devel@lists.freedesktop.org; Gerd Hoffmann ;
> > Daniel Vetter ; Catalin Marinas
> > ; nd 
> > Subject: Re: [Bug] virtio-gpu broken with qemu/kvm on arm64 on kernel 5.5+
> >
> > On Thu, Feb 20, 2020 at 4:44 AM Guillaume Gardet 
> > wrote:
> > >
> > > Hi,
> > >
> > > With (guest) kernel 5.5+ with qemu/kvm on arm64, I get lots of display
> > corruptions leading to this kind of screen:
> > > https://openqa.opensuse.org/tests/1174521#step/yast2_i/24
> > Looking at the screenshot, it seems cacheline-related?
>
> It could be.
>
> >
> > There was a change of memory type
> >
> >   https://lists.freedesktop.org/archives/dri-devel/2019-August/233456.html
> >
> > While the guest memory type is ignored on Intel, it is honored on ARM.
> > This attempt to fix it
> >
> >   https://lists.freedesktop.org/archives/dri-devel/2019-December/248271.html
> >
> > does not seem to land.
>
> I applied this patch on top of 5.5.4, but it does not fix the problem.
> Maybe more similar changes are required?
The patch looks legit.  Maybe the memory type is not the root cause?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] drm/virtio: fix resource id creation race

On Fri, Feb 21, 2020 at 3:14 AM John Bates  wrote:
>
> The previous code was not thread safe and caused
> undefined behavior from spurious duplicate resource IDs.
> In this patch, an atomic_t is used instead. We no longer
> see any duplicate IDs in tests with this change.
>
> Fixes: 16065fcdd19d ("drm/virtio: do NOT reuse resource ids")
> Signed-off-by: John Bates 
Reviewed-by: Chia-I Wu 
> ---
>  drivers/gpu/drm/virtio/virtgpu_object.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index 017a9e0fc3bb..890121a45625 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -42,8 +42,8 @@ static int virtio_gpu_resource_id_get(struct 
> virtio_gpu_device *vgdev,
>  * "f91a9dd35715 Fix unlinking resources from hash
>  * table." (Feb 2019) fixes the bug.
>  */
> -   static int handle;
> -   handle++;
> +   static atomic_t seqno = ATOMIC_INIT(0);
> +   int handle = atomic_inc_return();
> *resid = handle + 1;
resid 1 is (was) discriminated :D

> } else {
> int handle = ida_alloc(>resource_ida, GFP_KERNEL);
> --
> 2.25.0.265.gbab2e86ba0-goog
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

On Fri, Feb 21, 2020 at 7:59 AM Sean Christopherson
 wrote:
>
> On Thu, Feb 20, 2020 at 09:39:05PM -0800, Tian, Kevin wrote:
> > > From: Chia-I Wu 
> > > Sent: Friday, February 21, 2020 12:51 PM
> > > If you think it is the best for KVM to inspect hva to determine the memory
> > > type with page granularity, that is reasonable and should work for us too.
> > > The userspace can do something (e.g., add a GPU driver dependency to the
> > > hypervisor such that the dma-buf is imported as a GPU memory and mapped
> > > using
> > > vkMapMemory) or I can work with dma-buf maintainers to see if dma-buf's
> > > semantics can be changed.
> >
> > I think you need consider the live migration requirement as Paolo pointed 
> > out.
> > The migration thread needs to read/write the region, then it must use the
> > same type as GPU process and guest to read/write the region. In such case,
> > the hva mapped by Qemu should have the desired type as the guest. However,
> > adding GPU driver dependency to Qemu might trigger some concern. I'm not
> > sure whether there is generic mechanism though, to share dmabuf fd between 
> > GPU
> > process and Qemu while allowing Qemu to follow the desired type w/o using
> > vkMapMemory...
>
> Alternatively, KVM could make KVM_MEM_DMA and KVM_MEM_LOG_DIRTY_PAGES
> mutually exclusive, i.e. force a transition to WB memtype for the guest
> (with appropriate zapping) when migration is activated.  I think that
> would work?
Hm, virtio-gpu does not allow live migration when the 3D function
(virgl=on) is enabled.  This is the relevant code in qemu:

if (virtio_gpu_virgl_enabled(g->conf)) {
error_setg(>migration_blocker, "virgl is not yet migratable");

Although we (virtio-gpu and virglrenderer projects) plan to make host
GPU buffers available to the guest via memslots, those buffers should
be considered a part of the "GPU state".  The migration thread should
work with virglrenderer and let virglrenderer save/restore them, if
live migration is to be supported.

QEMU depends on GPU drivers already when configured with
--enable-virglrenderer.  There is vhost-user-gpu that can move the
dependency to a GPU process.  But there are still going to be cases
(e.g., nVidia's proprietary driver does not support dma-buf) where
QEMU cannot avoid GPU driver dependency.




> > Note this is orthogonal to whether introducing a new uapi or implicitly 
> > checking
> > hva to favor guest memory type. It's purely about Qemu itself. Ideally 
> > anyone
> > with the desire to access a dma-buf object should follow the expected 
> > semantics.
> > It's interesting that dma-buf sub-system doesn't provide a centralized
> > synchronization about memory type between multiple mmap paths.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

(resend because gmail did not format to plain text...)

On Thu, Feb 20, 2020 at 8:45 PM Chia-I Wu  wrote:
>
>
>
> On Thu, Feb 20, 2020 at 4:23 PM Tian, Kevin  wrote:
>>
>> > From: Chia-I Wu 
>> > Sent: Friday, February 21, 2020 6:24 AM
>> >
>> > On Wed, Feb 19, 2020 at 6:38 PM Tian, Kevin  wrote:
>> > >
>> > > > From: Tian, Kevin
>> > > > Sent: Thursday, February 20, 2020 10:05 AM
>> > > >
>> > > > > From: Chia-I Wu 
>> > > > > Sent: Thursday, February 20, 2020 3:37 AM
>> > > > >
>> > > > > On Wed, Feb 19, 2020 at 1:52 AM Tian, Kevin 
>> > wrote:
>> > > > > >
>> > > > > > > From: Paolo Bonzini
>> > > > > > > Sent: Wednesday, February 19, 2020 12:29 AM
>> > > > > > >
>> > > > > > > On 14/02/20 23:03, Sean Christopherson wrote:
>> > > > > > > >> On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu 
>> > > > wrote:
>> > > > > > > >>> AFAICT, it is currently allowed on ARM (verified) and AMD 
>> > > > > > > >>> (not
>> > > > > > > >>> verified, but svm_get_mt_mask returns 0 which supposedly
>> > means
>> > > > > the
>> > > > > > > NPT
>> > > > > > > >>> does not restrict what the guest PAT can do).  This diff 
>> > > > > > > >>> would do
>> > the
>> > > > > > > >>> trick for Intel without needing any uapi change:
>> > > > > > > >> I would be concerned about Intel CPU errata such as SKX40 and
>> > > > SKX59.
>> > > > > > > > The part KVM cares about, #MC, is already addressed by forcing
>> > UC
>> > > > for
>> > > > > > > MMIO.
>> > > > > > > > The data corruption issue is on the guest kernel to correctly 
>> > > > > > > > use
>> > WC
>> > > > > > > > and/or non-temporal writes.
>> > > > > > >
>> > > > > > > What about coherency across live migration?  The userspace
>> > process
>> > > > > would
>> > > > > > > use cached accesses, and also a WBINVD could potentially corrupt
>> > guest
>> > > > > > > memory.
>> > > > > > >
>> > > > > >
>> > > > > > In such case the userspace process possibly should conservatively 
>> > > > > > use
>> > > > > > UC mapping, as if for MMIO regions on a passthrough device.
>> > However
>> > > > > > there remains a problem. the definition of KVM_MEM_DMA implies
>> > > > > > favoring guest setting, which could be whatever type in concept. 
>> > > > > > Then
>> > > > > > assuming UC is also problematic. I'm not sure whether inventing
>> > another
>> > > > > > interface to query effective memory type from KVM is a good idea.
>> > There
>> > > > > > is no guarantee that the guest will use same type for every page 
>> > > > > > in the
>> > > > > > same slot, then such interface might be messy. Alternatively, maybe
>> > > > > > we could just have an interface for KVM userspace to force memory
>> > type
>> > > > > > for a given slot, if it is mainly used in para-virtualized 
>> > > > > > scenarios (e.g.
>> > > > > > virtio-gpu) where the guest is enlightened to use a forced type 
>> > > > > > (e.g.
>> > WC)?
>> > > > > KVM forcing the memory type for a given slot should work too.  But 
>> > > > > the
>> > > > > ignore-guest-pat bit seems to be Intel-specific.  We will need to
>> > > > > define how the second-level page attributes combine with the guest
>> > > > > page attributes somehow.
>> > > >
>> > > > oh, I'm not aware of that difference. without an ipat-equivalent
>> > > > capability, I'm not sure how to forcing random type here. If you look 
>> > > > at
>> > > > table 11-7 in Intel SDM, none of MTRR (EPT) memory type can lead to
>> > > > consistent effective type w

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

On Thu, Feb 20, 2020 at 4:23 PM Tian, Kevin  wrote:

> > From: Chia-I Wu 
> > Sent: Friday, February 21, 2020 6:24 AM
> >
> > On Wed, Feb 19, 2020 at 6:38 PM Tian, Kevin 
> wrote:
> > >
> > > > From: Tian, Kevin
> > > > Sent: Thursday, February 20, 2020 10:05 AM
> > > >
> > > > > From: Chia-I Wu 
> > > > > Sent: Thursday, February 20, 2020 3:37 AM
> > > > >
> > > > > On Wed, Feb 19, 2020 at 1:52 AM Tian, Kevin 
> > wrote:
> > > > > >
> > > > > > > From: Paolo Bonzini
> > > > > > > Sent: Wednesday, February 19, 2020 12:29 AM
> > > > > > >
> > > > > > > On 14/02/20 23:03, Sean Christopherson wrote:
> > > > > > > >> On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu <
> olva...@gmail.com>
> > > > wrote:
> > > > > > > >>> AFAICT, it is currently allowed on ARM (verified) and AMD
> (not
> > > > > > > >>> verified, but svm_get_mt_mask returns 0 which supposedly
> > means
> > > > > the
> > > > > > > NPT
> > > > > > > >>> does not restrict what the guest PAT can do).  This diff
> would do
> > the
> > > > > > > >>> trick for Intel without needing any uapi change:
> > > > > > > >> I would be concerned about Intel CPU errata such as SKX40
> and
> > > > SKX59.
> > > > > > > > The part KVM cares about, #MC, is already addressed by
> forcing
> > UC
> > > > for
> > > > > > > MMIO.
> > > > > > > > The data corruption issue is on the guest kernel to
> correctly use
> > WC
> > > > > > > > and/or non-temporal writes.
> > > > > > >
> > > > > > > What about coherency across live migration?  The userspace
> > process
> > > > > would
> > > > > > > use cached accesses, and also a WBINVD could potentially
> corrupt
> > guest
> > > > > > > memory.
> > > > > > >
> > > > > >
> > > > > > In such case the userspace process possibly should
> conservatively use
> > > > > > UC mapping, as if for MMIO regions on a passthrough device.
> > However
> > > > > > there remains a problem. the definition of KVM_MEM_DMA implies
> > > > > > favoring guest setting, which could be whatever type in concept.
> Then
> > > > > > assuming UC is also problematic. I'm not sure whether inventing
> > another
> > > > > > interface to query effective memory type from KVM is a good idea.
> > There
> > > > > > is no guarantee that the guest will use same type for every page
> in the
> > > > > > same slot, then such interface might be messy. Alternatively,
> maybe
> > > > > > we could just have an interface for KVM userspace to force memory
> > type
> > > > > > for a given slot, if it is mainly used in para-virtualized
> scenarios (e.g.
> > > > > > virtio-gpu) where the guest is enlightened to use a forced type
> (e.g.
> > WC)?
> > > > > KVM forcing the memory type for a given slot should work too.  But
> the
> > > > > ignore-guest-pat bit seems to be Intel-specific.  We will need to
> > > > > define how the second-level page attributes combine with the guest
> > > > > page attributes somehow.
> > > >
> > > > oh, I'm not aware of that difference. without an ipat-equivalent
> > > > capability, I'm not sure how to forcing random type here. If you
> look at
> > > > table 11-7 in Intel SDM, none of MTRR (EPT) memory type can lead to
> > > > consistent effective type when combining with random PAT value. So
> > > >  it is definitely a dead end.
> > > >
> > > > >
> > > > > KVM should in theory be able to tell that the userspace region is
> > > > > mapped with a certain memory type and can force the same memory
> > type
> > > > > onto the guest.  The userspace does not need to be involved.  But
> that
> > > > > sounds very slow?  This may be a dumb question, but would it help
> to
> > > > > add KVM_SET_DMA_BUF and let KVM negotiate the memory type with
> > the
> > > > > in-kernel GPU drivers?
> > &g

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

On Wed, Feb 19, 2020 at 6:13 PM Tian, Kevin  wrote:
> > > Curious... How is such slot exposed to the guest? A reserved memory
> > > region? Is it static or might be dynamically added?
> > The plan is for virtio-gpu device to reserve a huge memory region in
> > the guest.  Memslots may be added dynamically or statically to back
> > the region.
>
> so the region is marked as E820_RESERVED to prevent guest kernel
> from using it for other purpose and then virtio-gpu device will report
> virtio-gpu driver of the exact location of the region through its own
> interface?
The current plan is that the virtio-gpu device will have a bar for the
region, which is like the vram aperture on real GPUs.  The virtio-gpu
driver manages the region like managing vram.

When the guest userspace allocates from vram, the guest kernel
reserves an unused range from the region and tells the host the
offset.  The host allocates a real GPU buffer, maps the buffer, and
add a memslot with gpa==bar_base+offset (or mremap).  When the guest
userspace mmap, the guest kernel does a io_remap_pfn_range.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/3] KVM: x86: honor guest memory type

On Wed, Feb 19, 2020 at 6:38 PM Tian, Kevin  wrote:
>
> > From: Tian, Kevin
> > Sent: Thursday, February 20, 2020 10:05 AM
> >
> > > From: Chia-I Wu 
> > > Sent: Thursday, February 20, 2020 3:37 AM
> > >
> > > On Wed, Feb 19, 2020 at 1:52 AM Tian, Kevin  wrote:
> > > >
> > > > > From: Paolo Bonzini
> > > > > Sent: Wednesday, February 19, 2020 12:29 AM
> > > > >
> > > > > On 14/02/20 23:03, Sean Christopherson wrote:
> > > > > >> On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu 
> > wrote:
> > > > > >>> AFAICT, it is currently allowed on ARM (verified) and AMD (not
> > > > > >>> verified, but svm_get_mt_mask returns 0 which supposedly means
> > > the
> > > > > NPT
> > > > > >>> does not restrict what the guest PAT can do).  This diff would do 
> > > > > >>> the
> > > > > >>> trick for Intel without needing any uapi change:
> > > > > >> I would be concerned about Intel CPU errata such as SKX40 and
> > SKX59.
> > > > > > The part KVM cares about, #MC, is already addressed by forcing UC
> > for
> > > > > MMIO.
> > > > > > The data corruption issue is on the guest kernel to correctly use WC
> > > > > > and/or non-temporal writes.
> > > > >
> > > > > What about coherency across live migration?  The userspace process
> > > would
> > > > > use cached accesses, and also a WBINVD could potentially corrupt guest
> > > > > memory.
> > > > >
> > > >
> > > > In such case the userspace process possibly should conservatively use
> > > > UC mapping, as if for MMIO regions on a passthrough device. However
> > > > there remains a problem. the definition of KVM_MEM_DMA implies
> > > > favoring guest setting, which could be whatever type in concept. Then
> > > > assuming UC is also problematic. I'm not sure whether inventing another
> > > > interface to query effective memory type from KVM is a good idea. There
> > > > is no guarantee that the guest will use same type for every page in the
> > > > same slot, then such interface might be messy. Alternatively, maybe
> > > > we could just have an interface for KVM userspace to force memory type
> > > > for a given slot, if it is mainly used in para-virtualized scenarios 
> > > > (e.g.
> > > > virtio-gpu) where the guest is enlightened to use a forced type (e.g. 
> > > > WC)?
> > > KVM forcing the memory type for a given slot should work too.  But the
> > > ignore-guest-pat bit seems to be Intel-specific.  We will need to
> > > define how the second-level page attributes combine with the guest
> > > page attributes somehow.
> >
> > oh, I'm not aware of that difference. without an ipat-equivalent
> > capability, I'm not sure how to forcing random type here. If you look at
> > table 11-7 in Intel SDM, none of MTRR (EPT) memory type can lead to
> > consistent effective type when combining with random PAT value. So
> >  it is definitely a dead end.
> >
> > >
> > > KVM should in theory be able to tell that the userspace region is
> > > mapped with a certain memory type and can force the same memory type
> > > onto the guest.  The userspace does not need to be involved.  But that
> > > sounds very slow?  This may be a dumb question, but would it help to
> > > add KVM_SET_DMA_BUF and let KVM negotiate the memory type with the
> > > in-kernel GPU drivers?
> > >
> > >
> >
> > KVM_SET_DMA_BUF looks more reasonable. But I guess we don't need
> > KVM to be aware of such negotiation. We can continue your original
> > proposal to have KVM simply favor guest memory type (maybe still call
> > KVM_MEM_DMA). On the other hand, Qemu should just mmap on the
> > fd handle of the dmabuf passed from the virtio-gpu device backend,  e.g.
> > to conduct migration. That way the mmap request is finally served by
> > DRM and underlying GPU drivers, with proper type enforced automatically.
> >
>
> Thinking more possibly we don't need introduce new interface to KVM.
> As long as Qemu uses dmabuf interface to mmap the specific region,
> KVM can simply check memory type in host page table given hva of a
> memslot. If the type is UC or WC, it implies that userspace wants a
> non-coherent mapping which should be reflected in the guest side too.
>

Re: [Bug] virtio-gpu broken with qemu/kvm on arm64 on kernel 5.5+

On Thu, Feb 20, 2020 at 4:44 AM Guillaume Gardet
 wrote:
>
> Hi,
>
> With (guest) kernel 5.5+ with qemu/kvm on arm64, I get lots of display 
> corruptions leading to this kind of screen:
> https://openqa.opensuse.org/tests/1174521#step/yast2_i/24
Looking at the screenshot, it seems cacheline-related?

There was a change of memory type

  https://lists.freedesktop.org/archives/dri-devel/2019-August/233456.html

While the guest memory type is ignored on Intel, it is honored on ARM.
This attempt to fix it

  https://lists.freedesktop.org/archives/dri-devel/2019-December/248271.html

does not seem to land.


>
> I git bisected it to commit c66df701e783bc666593e6e665f13670760883ee
> **
> drm/virtio: switch from ttm to gem shmem helpers
>
>   virtio-gpu basically needs a sg_table for the bo, to tell the host where
>   the backing pages for the object are.  So the gem shmem helpers are a
>   perfect fit.  Some drm_gem_object_funcs need thin wrappers to update the
>   host state, but otherwise the helpers handle everything just fine.
>
>   Once the fencing was sorted the switch was surprisingly easy and for the
>   most part just removing the ttm code.
>
>   v4: fix drm_gem_object_funcs name.
>
>   Signed-off-by: Gerd Hoffmann 
>   Acked-by: Daniel Vetter 
>   Reviewed-by: Chia-I Wu 
>   Link: 
> http://patchwork.freedesktop.org/patch/msgid/20190829103301.3539-15-kra...@redhat.com
> **
>
> I also tested kernel 5.6-rc2 which has the same bug.
> Without kvm, the display is fine.
>
> Regards,
> Guillaume
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/virtio: fix virtio-gpu resource id creation race