Re: [PATCH v7 13/15] drm/tegra: Implement job submission part of new UAPI

2021-06-16 Thread Jon Hunter
Hi Mikko,

On 10/06/2021 12:04, Mikko Perttunen wrote:
> Implement the job submission IOCTL with a minimum feature set.
> 
> Signed-off-by: Mikko Perttunen 
> ---
> v7:
> * Allocate gather BO with DMA API to get page-aligned
>   memory
> * Add error prints to a few places where they were missing
> v6:
> * Remove sgt bypass path in gather_bo - this would cause
>   cache maintenance to be skipped and is unnecessary in
>   general.
> * Changes related to moving to using syncpoint IDs
> * Add syncobj related code
> * Print warning on submit failure describing the issue
> * Use ARCH_DMA_ADDR_T_64BIT to check if that is indeed
>   the case
> * Add support for relative syncpoint wait
> * Use pm_runtime_resume_and_get
> * Only try to resume engines that support runtime PM
> * Removed uapi subdirectory
> * Don't use "copy_err" variables for copy_from_user
>   return value
> * Fix setting of blocklinear flag
> v5:
> * Add 16K size limit to copies from userspace.
> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>   to prevent oversized shift on 32-bit platforms.
> v4:
> * Remove all features that are not strictly necessary.
> * Split into two patches.
> v3:
> * Remove WRITE_RELOC. Relocations are now patched implicitly
>   when patching is needed.
> * Directly call PM runtime APIs on devices instead of using
>   power_on/power_off callbacks.
> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> * Accommodate for removal of timeout field and inlining of
>   syncpt_incrs array.
> * Copy entire user arrays at a time instead of going through
>   elements one-by-one.
> * Implement waiting of DMA reservations.
> * Split out gather_bo implementation into a separate file.
> * Fix length parameter passed to sg_init_one in gather_bo
> * Cosmetic cleanup.
> ---
>  drivers/gpu/drm/tegra/Makefile|   2 +
>  drivers/gpu/drm/tegra/drm.c   |   4 +-
>  drivers/gpu/drm/tegra/gather_bo.c |  82 +
>  drivers/gpu/drm/tegra/gather_bo.h |  24 ++
>  drivers/gpu/drm/tegra/submit.c| 549 ++
>  drivers/gpu/drm/tegra/submit.h|  17 +
>  6 files changed, 677 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/tegra/gather_bo.c
>  create mode 100644 drivers/gpu/drm/tegra/gather_bo.h
>  create mode 100644 drivers/gpu/drm/tegra/submit.c
>  create mode 100644 drivers/gpu/drm/tegra/submit.h

...

> diff --git a/drivers/gpu/drm/tegra/submit.c b/drivers/gpu/drm/tegra/submit.c
> new file mode 100644
> index ..e3200c10ca9e
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/submit.c
> @@ -0,0 +1,549 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "drm.h"
> +#include "gather_bo.h"
> +#include "gem.h"
> +#include "submit.h"
> +#include "uapi.h"
> +
> +#define SUBMIT_ERR(ctx, fmt, ...) \
> + dev_err_ratelimited(ctx->client->base.dev, \
> + "%s: job submission failed: " fmt "\n", \
> + current->comm __VA_OPT__(,) __VA_ARGS__)


For older compilers that don't support __VA_OPT__ this generates a
compilation error ...

drivers/gpu/drm/tegra/submit.c: In function ‘submit_copy_gather_data’:
drivers/gpu/drm/tegra/submit.c:27:17: error: expected ‘)’ before
‘__VA_OPT__’
   current->comm __VA_OPT__(,) __VA_ARGS__)
 ^
I think that we may just have to use ##__VA_ARGS__ here.

Cheers
Jon

-- 
nvpublic


Re: [PATCH v7 13/15] drm/tegra: Implement job submission part of new UAPI

2021-06-15 Thread Dmitry Osipenko
10.06.2021 14:04, Mikko Perttunen пишет:
> +drop_refs:
> + for (;;) {
> + if (i-- == 0)
> + break;
> +

while(i--) ?


Re: [PATCH v7 13/15] drm/tegra: Implement job submission part of new UAPI

2021-06-15 Thread Dmitry Osipenko
10.06.2021 14:04, Mikko Perttunen пишет:
> +++ b/drivers/gpu/drm/tegra/gather_bo.c
> @@ -0,0 +1,82 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "gather_bo.h"
> +
...
> +const struct host1x_bo_ops gather_bo_ops = {
> + .get = gather_bo_get,
> + .put = gather_bo_put,
> + .pin = gather_bo_pin,
> + .unpin = gather_bo_unpin,
> + .mmap = gather_bo_mmap,
> + .munmap = gather_bo_munmap,
> +};

I think it's a wrong to model host1x bo as a part of DRM driver. It is
akin to the ill-defined model of DRM GEMS represented by host1x_bo that
current mainline driver uses.

Host1x BO should belong to Host1x driver. DRM BO should belong to DRM
driver. Mixing them together makes no sense, it is very unnatural and
confusing. This should be a part of the driver reorganization discussion.


Re: [PATCH v7 13/15] drm/tegra: Implement job submission part of new UAPI

2021-06-15 Thread Mikko Perttunen




On 6/15/21 10:00 PM, Jon Hunter wrote:


On 10/06/2021 12:04, Mikko Perttunen wrote:

Implement the job submission IOCTL with a minimum feature set.

Signed-off-by: Mikko Perttunen 
---
v7:
* Allocate gather BO with DMA API to get page-aligned
   memory
* Add error prints to a few places where they were missing
v6:
* Remove sgt bypass path in gather_bo - this would cause
   cache maintenance to be skipped and is unnecessary in
   general.
* Changes related to moving to using syncpoint IDs
* Add syncobj related code
* Print warning on submit failure describing the issue
* Use ARCH_DMA_ADDR_T_64BIT to check if that is indeed
   the case
* Add support for relative syncpoint wait
* Use pm_runtime_resume_and_get
* Only try to resume engines that support runtime PM
* Removed uapi subdirectory
* Don't use "copy_err" variables for copy_from_user
   return value
* Fix setting of blocklinear flag
v5:
* Add 16K size limit to copies from userspace.
* Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
   to prevent oversized shift on 32-bit platforms.
v4:
* Remove all features that are not strictly necessary.
* Split into two patches.
v3:
* Remove WRITE_RELOC. Relocations are now patched implicitly
   when patching is needed.
* Directly call PM runtime APIs on devices instead of using
   power_on/power_off callbacks.
* Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
* Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
* Accommodate for removal of timeout field and inlining of
   syncpt_incrs array.
* Copy entire user arrays at a time instead of going through
   elements one-by-one.
* Implement waiting of DMA reservations.
* Split out gather_bo implementation into a separate file.
* Fix length parameter passed to sg_init_one in gather_bo
* Cosmetic cleanup.
---
  drivers/gpu/drm/tegra/Makefile|   2 +
  drivers/gpu/drm/tegra/drm.c   |   4 +-
  drivers/gpu/drm/tegra/gather_bo.c |  82 +
  drivers/gpu/drm/tegra/gather_bo.h |  24 ++
  drivers/gpu/drm/tegra/submit.c| 549 ++
  drivers/gpu/drm/tegra/submit.h|  17 +
  6 files changed, 677 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/tegra/gather_bo.c
  create mode 100644 drivers/gpu/drm/tegra/gather_bo.h
  create mode 100644 drivers/gpu/drm/tegra/submit.c
  create mode 100644 drivers/gpu/drm/tegra/submit.h


...


+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+  struct drm_file *file)
+{
+   struct tegra_drm_file *fpriv = file->driver_priv;
+   struct drm_tegra_channel_submit *args = data;
+   struct tegra_drm_submit_data *job_data;
+   struct drm_syncobj *syncobj = NULL;
+   struct tegra_drm_context *ctx;
+   struct host1x_job *job;
+   struct gather_bo *bo;
+   u32 i;
+   int err;
+
+   mutex_lock(>lock);
+   ctx = xa_load(>contexts, args->channel_ctx);
+   if (!ctx) {
+   mutex_unlock(>lock);
+   pr_err_ratelimited("%s: %s: invalid channel_ctx '%d'", __func__,
+   current->comm, args->channel_ctx);
+   return -EINVAL;
+   }
+
+   if (args->syncobj_in) {
+   struct dma_fence *fence;
+
+   err = drm_syncobj_find_fence(file, args->syncobj_in, 0, 0, 
);
+   if (err) {
+   SUBMIT_ERR(ctx, "invalid syncobj_in '%d'", 
args->syncobj_in);
+   goto unlock;
+   }
+
+   err = dma_fence_wait_timeout(fence, true, 
msecs_to_jiffies(1));
+   dma_fence_put(fence);
+   if (err) {
+   SUBMIT_ERR(ctx, "wait for syncobj_in timed out");
+   goto unlock;
+   }
+   }
+
+   if (args->syncobj_out) {
+   syncobj = drm_syncobj_find(file, args->syncobj_out);
+   if (!syncobj) {
+   SUBMIT_ERR(ctx, "invalid syncobj_out '%d'", 
args->syncobj_out);
+   err = -ENOENT;
+   goto unlock;
+   }
+   }
+
+   /* Allocate gather BO and copy gather words in. */
+   err = submit_copy_gather_data(, drm->dev, ctx, args);
+   if (err)
+   goto unlock;
+
+   job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+   if (!job_data) {
+   SUBMIT_ERR(ctx, "failed to allocate memory for job data");
+   err = -ENOMEM;
+   goto put_bo;
+   }
+
+   /* Get data buffer mappings and do relocation patching. */
+   err = submit_process_bufs(ctx, bo, args, job_data);
+   if (err)
+   goto free_job_data;
+
+   /* Allocate host1x_job and add gathers and waits to it. */
+   err = submit_create_job(, ctx, bo, args, job_data,
+   >syncpoints);
+   if (err)
+   goto free_job_data;
+
+   /* Map gather data for Host1x. */
+   err = host1x_job_pin(job, 

Re: [PATCH v7 13/15] drm/tegra: Implement job submission part of new UAPI

2021-06-15 Thread Jon Hunter


On 10/06/2021 12:04, Mikko Perttunen wrote:
> Implement the job submission IOCTL with a minimum feature set.
> 
> Signed-off-by: Mikko Perttunen 
> ---
> v7:
> * Allocate gather BO with DMA API to get page-aligned
>   memory
> * Add error prints to a few places where they were missing
> v6:
> * Remove sgt bypass path in gather_bo - this would cause
>   cache maintenance to be skipped and is unnecessary in
>   general.
> * Changes related to moving to using syncpoint IDs
> * Add syncobj related code
> * Print warning on submit failure describing the issue
> * Use ARCH_DMA_ADDR_T_64BIT to check if that is indeed
>   the case
> * Add support for relative syncpoint wait
> * Use pm_runtime_resume_and_get
> * Only try to resume engines that support runtime PM
> * Removed uapi subdirectory
> * Don't use "copy_err" variables for copy_from_user
>   return value
> * Fix setting of blocklinear flag
> v5:
> * Add 16K size limit to copies from userspace.
> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>   to prevent oversized shift on 32-bit platforms.
> v4:
> * Remove all features that are not strictly necessary.
> * Split into two patches.
> v3:
> * Remove WRITE_RELOC. Relocations are now patched implicitly
>   when patching is needed.
> * Directly call PM runtime APIs on devices instead of using
>   power_on/power_off callbacks.
> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> * Accommodate for removal of timeout field and inlining of
>   syncpt_incrs array.
> * Copy entire user arrays at a time instead of going through
>   elements one-by-one.
> * Implement waiting of DMA reservations.
> * Split out gather_bo implementation into a separate file.
> * Fix length parameter passed to sg_init_one in gather_bo
> * Cosmetic cleanup.
> ---
>  drivers/gpu/drm/tegra/Makefile|   2 +
>  drivers/gpu/drm/tegra/drm.c   |   4 +-
>  drivers/gpu/drm/tegra/gather_bo.c |  82 +
>  drivers/gpu/drm/tegra/gather_bo.h |  24 ++
>  drivers/gpu/drm/tegra/submit.c| 549 ++
>  drivers/gpu/drm/tegra/submit.h|  17 +
>  6 files changed, 677 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/tegra/gather_bo.c
>  create mode 100644 drivers/gpu/drm/tegra/gather_bo.h
>  create mode 100644 drivers/gpu/drm/tegra/submit.c
>  create mode 100644 drivers/gpu/drm/tegra/submit.h

...

> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
> +struct drm_file *file)
> +{
> + struct tegra_drm_file *fpriv = file->driver_priv;
> + struct drm_tegra_channel_submit *args = data;
> + struct tegra_drm_submit_data *job_data;
> + struct drm_syncobj *syncobj = NULL;
> + struct tegra_drm_context *ctx;
> + struct host1x_job *job;
> + struct gather_bo *bo;
> + u32 i;
> + int err;
> +
> + mutex_lock(>lock);
> + ctx = xa_load(>contexts, args->channel_ctx);
> + if (!ctx) {
> + mutex_unlock(>lock);
> + pr_err_ratelimited("%s: %s: invalid channel_ctx '%d'", __func__,
> + current->comm, args->channel_ctx);
> + return -EINVAL;
> + }
> +
> + if (args->syncobj_in) {
> + struct dma_fence *fence;
> +
> + err = drm_syncobj_find_fence(file, args->syncobj_in, 0, 0, 
> );
> + if (err) {
> + SUBMIT_ERR(ctx, "invalid syncobj_in '%d'", 
> args->syncobj_in);
> + goto unlock;
> + }
> +
> + err = dma_fence_wait_timeout(fence, true, 
> msecs_to_jiffies(1));
> + dma_fence_put(fence);
> + if (err) {
> + SUBMIT_ERR(ctx, "wait for syncobj_in timed out");
> + goto unlock;
> + }
> + }
> +
> + if (args->syncobj_out) {
> + syncobj = drm_syncobj_find(file, args->syncobj_out);
> + if (!syncobj) {
> + SUBMIT_ERR(ctx, "invalid syncobj_out '%d'", 
> args->syncobj_out);
> + err = -ENOENT;
> + goto unlock;
> + }
> + }
> +
> + /* Allocate gather BO and copy gather words in. */
> + err = submit_copy_gather_data(, drm->dev, ctx, args);
> + if (err)
> + goto unlock;
> +
> + job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
> + if (!job_data) {
> + SUBMIT_ERR(ctx, "failed to allocate memory for job data");
> + err = -ENOMEM;
> + goto put_bo;
> + }
> +
> + /* Get data buffer mappings and do relocation patching. */
> + err = submit_process_bufs(ctx, bo, args, job_data);
> + if (err)
> + goto free_job_data;
> +
> + /* Allocate host1x_job and add gathers and waits to it. */
> + err = submit_create_job(, ctx, bo, args, job_data,
> + >syncpoints);
> + if (err)
> + goto free_job_data;