Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-23 Thread Boris Brezillon
On Sun, 22 Jan 2023 17:48:37 +
Matthew Brost  wrote:

> On Fri, Jan 20, 2023 at 11:22:45AM +0100, Boris Brezillon wrote:
> > On Thu, 19 Jan 2023 16:38:06 +
> > Matthew Brost  wrote:
> >   
> > > On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:  
> > > > On 1/19/23 05:58, Matthew Brost wrote:
> > > > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> > > > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > > > > > > 
> > > > > > > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > > > > > > This commit provides the implementation for the new uapi 
> > > > > > > > motivated by the
> > > > > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > > > > 
> > > > > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > > > > >      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the 
> > > > > > > > portion of VA
> > > > > > > >      space managed by the kernel and userspace, respectively.
> > > > > > > > 
> > > > > > > > 2) Allocate and free a VA space region as well as bind and 
> > > > > > > > unbind memory
> > > > > > > >      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
> > > > > > > > ioctl.
> > > > > > > >      UMDs can request the named operations to be processed 
> > > > > > > > either
> > > > > > > >      synchronously or asynchronously. It supports DRM syncobjs
> > > > > > > >      (incl. timelines) as synchronization mechanism. The 
> > > > > > > > management of the
> > > > > > > >      GPU VA mappings is implemented with the DRM GPU VA manager.
> > > > > > > > 
> > > > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC 
> > > > > > > > ioctl. The
> > > > > > > >      execution happens asynchronously. It supports DRM syncobj 
> > > > > > > > (incl.
> > > > > > > >      timelines) as synchronization mechanism. DRM GEM object 
> > > > > > > > locking is
> > > > > > > >      handled with drm_exec.
> > > > > > > > 
> > > > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use 
> > > > > > > > the DRM
> > > > > > > > GPU scheduler for the asynchronous paths.
> > > > > > > > 
> > > > > > > > Signed-off-by: Danilo Krummrich 
> > > > > > > > ---
> > > > > > > >    Documentation/gpu/driver-uapi.rst   |   3 +
> > > > > > > >    drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > > > > >    drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > > > > 
> > > > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > > > > >    11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h   
> > > > > > > >  
> > > > > > > ...
> > > > > > > > 
> > > > > > > > +static struct dma_fence *
> > > > > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > > > > +{
> > > > > > > > +    struct nouveau_bind_job *bind_job = 
> > > > > > > > to_nouveau_bind_job(job);
> > > > > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > > > > +    struct bind_job_op *op;
> > > > > > > > +    int ret = 0;
> > > > > > > > +
> > > > > > > 
> > > > > > > I was looking at how nouveau does the async binding compared to 
> > > > > > > how xe
> > > > > > > does it.
> > > > > > > It looks to me that this function being a scheduler run_job 
> > > > > > > callback is
> > > > > > > the main part of the VM_BIND dma-fence signalling critical 
> > > > > > > section for
> > > > > > > the job's done_fence and if so, needs to be annotated as such?
> > > > > > 
> > > > > > Yes, that's the case.
> > > > > > 
> > > > > > > 
> > > > > > > For example nouveau_uvma_region_new allocates memory, which is not
> > > > > > > allowed if in a dma_fence signalling critical section and the 
> > > > > > > locking
> > > > > > > also looks suspicious?
> > > > > > 
> > > > > > Thanks for pointing this out, I missed that somehow.
> > > > > > 
> > > > > > I will change it to pre-allocate new regions, mappings and page 
> > > > > > tables
> > > > > > within the job's submit() function.
> > > > > > 
> > > > > 
> > > > > Yea that what we basically do in Xe, in the IOCTL step allocate all 
> > > > > the
> > > > > backing store for new page tables, populate new 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-22 Thread Matthew Brost
On Fri, Jan 20, 2023 at 11:22:45AM +0100, Boris Brezillon wrote:
> On Thu, 19 Jan 2023 16:38:06 +
> Matthew Brost  wrote:
> 
> > On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:
> > > On 1/19/23 05:58, Matthew Brost wrote:  
> > > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:  
> > > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:  
> > > > > > 
> > > > > > On 1/18/23 07:12, Danilo Krummrich wrote:  
> > > > > > > This commit provides the implementation for the new uapi 
> > > > > > > motivated by the
> > > > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > > > 
> > > > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > > > >      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the 
> > > > > > > portion of VA
> > > > > > >      space managed by the kernel and userspace, respectively.
> > > > > > > 
> > > > > > > 2) Allocate and free a VA space region as well as bind and unbind 
> > > > > > > memory
> > > > > > >      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
> > > > > > > ioctl.
> > > > > > >      UMDs can request the named operations to be processed either
> > > > > > >      synchronously or asynchronously. It supports DRM syncobjs
> > > > > > >      (incl. timelines) as synchronization mechanism. The 
> > > > > > > management of the
> > > > > > >      GPU VA mappings is implemented with the DRM GPU VA manager.
> > > > > > > 
> > > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC 
> > > > > > > ioctl. The
> > > > > > >      execution happens asynchronously. It supports DRM syncobj 
> > > > > > > (incl.
> > > > > > >      timelines) as synchronization mechanism. DRM GEM object 
> > > > > > > locking is
> > > > > > >      handled with drm_exec.
> > > > > > > 
> > > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use 
> > > > > > > the DRM
> > > > > > > GPU scheduler for the asynchronous paths.
> > > > > > > 
> > > > > > > Signed-off-by: Danilo Krummrich 
> > > > > > > ---
> > > > > > >    Documentation/gpu/driver-uapi.rst   |   3 +
> > > > > > >    drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > > > >    drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > > > 
> > > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > > > >    11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h  
> > > > > > ...  
> > > > > > > 
> > > > > > > +static struct dma_fence *
> > > > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > > > +{
> > > > > > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > > > +    struct bind_job_op *op;
> > > > > > > +    int ret = 0;
> > > > > > > +  
> > > > > > 
> > > > > > I was looking at how nouveau does the async binding compared to how 
> > > > > > xe
> > > > > > does it.
> > > > > > It looks to me that this function being a scheduler run_job 
> > > > > > callback is
> > > > > > the main part of the VM_BIND dma-fence signalling critical section 
> > > > > > for
> > > > > > the job's done_fence and if so, needs to be annotated as such?  
> > > > > 
> > > > > Yes, that's the case.
> > > > >   
> > > > > > 
> > > > > > For example nouveau_uvma_region_new allocates memory, which is not
> > > > > > allowed if in a dma_fence signalling critical section and the 
> > > > > > locking
> > > > > > also looks suspicious?  
> > > > > 
> > > > > Thanks for pointing this out, I missed that somehow.
> > > > > 
> > > > > I will change it to pre-allocate new regions, mappings and page tables
> > > > > within the job's submit() function.
> > > > >   
> > > > 
> > > > Yea that what we basically do in Xe, in the IOCTL step allocate all the
> > > > backing store for new page tables, populate new page tables (these are
> > > > not yet visible in the page table structure), and in last step which is
> > > > executed after all the dependencies are satified program all the leaf
> > > > entires making the new binding visible.
> > > > 
> > > > We screwed have this up by defering most of the IOCTL to a worker but
> > > > will fix this fix this one 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-20 Thread Boris Brezillon
On Thu, 19 Jan 2023 16:38:06 +
Matthew Brost  wrote:

> On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:
> > On 1/19/23 05:58, Matthew Brost wrote:  
> > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:  
> > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:  
> > > > > 
> > > > > On 1/18/23 07:12, Danilo Krummrich wrote:  
> > > > > > This commit provides the implementation for the new uapi motivated 
> > > > > > by the
> > > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > > 
> > > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > > >      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the 
> > > > > > portion of VA
> > > > > >      space managed by the kernel and userspace, respectively.
> > > > > > 
> > > > > > 2) Allocate and free a VA space region as well as bind and unbind 
> > > > > > memory
> > > > > >      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
> > > > > > ioctl.
> > > > > >      UMDs can request the named operations to be processed either
> > > > > >      synchronously or asynchronously. It supports DRM syncobjs
> > > > > >      (incl. timelines) as synchronization mechanism. The management 
> > > > > > of the
> > > > > >      GPU VA mappings is implemented with the DRM GPU VA manager.
> > > > > > 
> > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. 
> > > > > > The
> > > > > >      execution happens asynchronously. It supports DRM syncobj 
> > > > > > (incl.
> > > > > >      timelines) as synchronization mechanism. DRM GEM object 
> > > > > > locking is
> > > > > >      handled with drm_exec.
> > > > > > 
> > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the 
> > > > > > DRM
> > > > > > GPU scheduler for the asynchronous paths.
> > > > > > 
> > > > > > Signed-off-by: Danilo Krummrich 
> > > > > > ---
> > > > > >    Documentation/gpu/driver-uapi.rst   |   3 +
> > > > > >    drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > > >    drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > > >    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > > >    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > > >    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > > 
> > > > > >    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > > >    11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h  
> > > > > ...  
> > > > > > 
> > > > > > +static struct dma_fence *
> > > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > > +{
> > > > > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > > +    struct bind_job_op *op;
> > > > > > +    int ret = 0;
> > > > > > +  
> > > > > 
> > > > > I was looking at how nouveau does the async binding compared to how xe
> > > > > does it.
> > > > > It looks to me that this function being a scheduler run_job callback 
> > > > > is
> > > > > the main part of the VM_BIND dma-fence signalling critical section for
> > > > > the job's done_fence and if so, needs to be annotated as such?  
> > > > 
> > > > Yes, that's the case.
> > > >   
> > > > > 
> > > > > For example nouveau_uvma_region_new allocates memory, which is not
> > > > > allowed if in a dma_fence signalling critical section and the locking
> > > > > also looks suspicious?  
> > > > 
> > > > Thanks for pointing this out, I missed that somehow.
> > > > 
> > > > I will change it to pre-allocate new regions, mappings and page tables
> > > > within the job's submit() function.
> > > >   
> > > 
> > > Yea that what we basically do in Xe, in the IOCTL step allocate all the
> > > backing store for new page tables, populate new page tables (these are
> > > not yet visible in the page table structure), and in last step which is
> > > executed after all the dependencies are satified program all the leaf
> > > entires making the new binding visible.
> > > 
> > > We screwed have this up by defering most of the IOCTL to a worker but
> > > will fix this fix this one way or another soon - get rid of worker or
> > > introduce a type of sync that is signaled after the worker + publish the
> > > dma-fence in the worker. I'd like to close on this one soon.  
> > > > For the ops structures the drm_gpuva_manager allocates for reporting the
> > > > split/merge steps 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-20 Thread Boris Brezillon
On Thu, 19 Jan 2023 04:58:48 +
Matthew Brost  wrote:

> > For the ops structures the drm_gpuva_manager allocates for reporting the
> > split/merge steps back to the driver I have ideas to entirely avoid
> > allocations, which also is a good thing in respect of Christians feedback
> > regarding the huge amount of mapping requests some applications seem to
> > generate.
> >  
> 
> It should be fine to have allocations to report the split/merge step as
> this step should be before a dma-fence is published, but yea if possible
> to avoid extra allocs as that is always better.
> 
> Also BTW, great work on drm_gpuva_manager too. We will almost likely
> pick this up in Xe rather than open coding all of this as we currently
> do. We should probably start the port to this soon so we can contribute
> to the implementation and get both of our drivers upstream sooner.

Also quite interested in using this drm_gpuva_manager for pancsf, since
I've been open-coding something similar. Didn't have the
gpuva_region concept to make sure VA mapping/unmapping requests don't
don't go outside a pre-reserved region, but it seems to automate some
of the stuff I've been doing quite nicely.


Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Matthew Brost
On Thu, Jan 19, 2023 at 11:25:51PM +0100, Danilo Krummrich wrote:
> On 1/19/23 22:47, Matthew Brost wrote:
> > On Thu, Jan 19, 2023 at 06:46:30PM +0100, Danilo Krummrich wrote:
> > > 
> > > 
> > > On 1/19/23 17:38, Matthew Brost wrote:
> > > > On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:
> > > > > On 1/19/23 05:58, Matthew Brost wrote:
> > > > > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> > > > > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > > > > > > > 
> > > > > > > > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > > > > > > > This commit provides the implementation for the new uapi 
> > > > > > > > > motivated by the
> > > > > > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > > > > > 
> > > > > > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > > > > > >    DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify 
> > > > > > > > > the portion of VA
> > > > > > > > >    space managed by the kernel and userspace, 
> > > > > > > > > respectively.
> > > > > > > > > 
> > > > > > > > > 2) Allocate and free a VA space region as well as bind and 
> > > > > > > > > unbind memory
> > > > > > > > >    to the GPUs VA space via the new 
> > > > > > > > > DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > > > > > > > >    UMDs can request the named operations to be processed 
> > > > > > > > > either
> > > > > > > > >    synchronously or asynchronously. It supports DRM 
> > > > > > > > > syncobjs
> > > > > > > > >    (incl. timelines) as synchronization mechanism. The 
> > > > > > > > > management of the
> > > > > > > > >    GPU VA mappings is implemented with the DRM GPU VA 
> > > > > > > > > manager.
> > > > > > > > > 
> > > > > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC 
> > > > > > > > > ioctl. The
> > > > > > > > >    execution happens asynchronously. It supports DRM 
> > > > > > > > > syncobj (incl.
> > > > > > > > >    timelines) as synchronization mechanism. DRM GEM 
> > > > > > > > > object locking is
> > > > > > > > >    handled with drm_exec.
> > > > > > > > > 
> > > > > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, 
> > > > > > > > > use the DRM
> > > > > > > > > GPU scheduler for the asynchronous paths.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Danilo Krummrich 
> > > > > > > > > ---
> > > > > > > > >  Documentation/gpu/driver-uapi.rst   |   3 +
> > > > > > > > >  drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > > > > > >  drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > > > > > 
> > > > > > > > >  drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > > > > > >  11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > > > > > >  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > > > > > >  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > > > > > >  create mode 100644 
> > > > > > > > > drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > > > > > >  create mode 100644 
> > > > > > > > > drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > > > > > ...
> > > > > > > > > 
> > > > > > > > > +static struct dma_fence *
> > > > > > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > > > > > +{
> > > > > > > > > +    struct nouveau_bind_job *bind_job = 
> > > > > > > > > to_nouveau_bind_job(job);
> > > > > > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > > > > > +    struct bind_job_op *op;
> > > > > > > > > +    int ret = 0;
> > > > > > > > > +
> > > > > > > > 
> > > > > > > > I was looking at how nouveau does the async binding compared to 
> > > > > > > > how xe
> > > > > > > > does it.
> > > > > > > > It looks to me that this function being a scheduler run_job 
> > > > > > > > callback is
> > > > > > > > the main part of the VM_BIND dma-fence signalling critical 
> > > > > > > > section for
> > > > > > > > the job's done_fence and if so, needs to be annotated as such?
> > > > > > > 
> > > > > > > Yes, that's the case.
> > > > > > > 
> > > > > > > > 
> > > > > > > > For example nouveau_uvma_region_new allocates memory, which is 
> > > > > > > > not
> > > > > > > > allowed if in a dma_fence signalling critical section and the 
> > > > > > > > locking
> > > > > > > > also looks suspicious?
> > > > > > > 
> > > > > > > Thanks for pointing this out, I missed that somehow.
> > > > > > > 
> > 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Danilo Krummrich

On 1/19/23 22:47, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 06:46:30PM +0100, Danilo Krummrich wrote:



On 1/19/23 17:38, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:

On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:


On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
       DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
       space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
       UMDs can request the named operations to be processed either
       synchronously or asynchronously. It supports DRM syncobjs
       (incl. timelines) as synchronization mechanism. The management of the
       GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
       execution happens asynchronously. It supports DRM syncobj (incl.
       timelines) as synchronization mechanism. DRM GEM object locking is
       handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
     Documentation/gpu/driver-uapi.rst   |   3 +
     drivers/gpu/drm/nouveau/Kbuild  |   2 +
     drivers/gpu/drm/nouveau/Kconfig |   2 +
     drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
     drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
     drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
     drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
     drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
     drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
     drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
     drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
     11 files changed, 1295 insertions(+), 4 deletions(-)
     create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
     create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
     create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
     create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?


Yes, that's the case.



For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?


Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.



Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.



It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.


I think we can't really ask for the split/merge steps before actually
running the job, since it requires the particular VA space not to change
while performing those operations.

E.g. if we'd run the split/merge steps at job submit() time the underlying
VA space could be changed by other bind jobs executing before this one,
which would make the calculated split/merge steps obsolete and wrong.



Hmm, maybe I'm not understanding this implementation, admittedly I
haven't studied the gpuva 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Matthew Brost
On Thu, Jan 19, 2023 at 06:46:30PM +0100, Danilo Krummrich wrote:
> 
> 
> On 1/19/23 17:38, Matthew Brost wrote:
> > On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:
> > > On 1/19/23 05:58, Matthew Brost wrote:
> > > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> > > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > > > > > 
> > > > > > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > > > > > This commit provides the implementation for the new uapi 
> > > > > > > motivated by the
> > > > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > > > 
> > > > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > > > >       DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the 
> > > > > > > portion of VA
> > > > > > >       space managed by the kernel and userspace, respectively.
> > > > > > > 
> > > > > > > 2) Allocate and free a VA space region as well as bind and unbind 
> > > > > > > memory
> > > > > > >       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND 
> > > > > > > ioctl.
> > > > > > >       UMDs can request the named operations to be processed either
> > > > > > >       synchronously or asynchronously. It supports DRM syncobjs
> > > > > > >       (incl. timelines) as synchronization mechanism. The 
> > > > > > > management of the
> > > > > > >       GPU VA mappings is implemented with the DRM GPU VA manager.
> > > > > > > 
> > > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC 
> > > > > > > ioctl. The
> > > > > > >       execution happens asynchronously. It supports DRM syncobj 
> > > > > > > (incl.
> > > > > > >       timelines) as synchronization mechanism. DRM GEM object 
> > > > > > > locking is
> > > > > > >       handled with drm_exec.
> > > > > > > 
> > > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use 
> > > > > > > the DRM
> > > > > > > GPU scheduler for the asynchronous paths.
> > > > > > > 
> > > > > > > Signed-off-by: Danilo Krummrich 
> > > > > > > ---
> > > > > > >     Documentation/gpu/driver-uapi.rst   |   3 +
> > > > > > >     drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > > > >     drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > > > 
> > > > > > >     drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > > > >     11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > > > >     create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > > > >     create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > > > >     create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > > > >     create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > > > ...
> > > > > > > 
> > > > > > > +static struct dma_fence *
> > > > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > > > +{
> > > > > > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > > > +    struct bind_job_op *op;
> > > > > > > +    int ret = 0;
> > > > > > > +
> > > > > > 
> > > > > > I was looking at how nouveau does the async binding compared to how 
> > > > > > xe
> > > > > > does it.
> > > > > > It looks to me that this function being a scheduler run_job 
> > > > > > callback is
> > > > > > the main part of the VM_BIND dma-fence signalling critical section 
> > > > > > for
> > > > > > the job's done_fence and if so, needs to be annotated as such?
> > > > > 
> > > > > Yes, that's the case.
> > > > > 
> > > > > > 
> > > > > > For example nouveau_uvma_region_new allocates memory, which is not
> > > > > > allowed if in a dma_fence signalling critical section and the 
> > > > > > locking
> > > > > > also looks suspicious?
> > > > > 
> > > > > Thanks for pointing this out, I missed that somehow.
> > > > > 
> > > > > I will change it to pre-allocate new regions, mappings and page tables
> > > > > within the job's submit() function.
> > > > > 
> > > > 
> > > > Yea that what we basically do in Xe, in the IOCTL step allocate all the
> > > > backing store for new page tables, populate new page tables (these are
> > > > not yet visible in the page table structure), and in last step which is
> > > > executed after all the dependencies are satified program all the leaf
> > > > entires making the new binding visible.
> > > > 
> > > > We screwed have this up by defering most of the IOCTL to a worker but
> > > > will fix this fix this one way or 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Danilo Krummrich




On 1/19/23 17:38, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:

On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:


On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
      space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
      UMDs can request the named operations to be processed either
      synchronously or asynchronously. It supports DRM syncobjs
      (incl. timelines) as synchronization mechanism. The management of the
      GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
      execution happens asynchronously. It supports DRM syncobj (incl.
      timelines) as synchronization mechanism. DRM GEM object locking is
      handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
    Documentation/gpu/driver-uapi.rst   |   3 +
    drivers/gpu/drm/nouveau/Kbuild  |   2 +
    drivers/gpu/drm/nouveau/Kconfig |   2 +
    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
    11 files changed, 1295 insertions(+), 4 deletions(-)
    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?


Yes, that's the case.



For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?


Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.



Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.



It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.


I think we can't really ask for the split/merge steps before actually
running the job, since it requires the particular VA space not to change
while performing those operations.

E.g. if we'd run the split/merge steps at job submit() time the underlying
VA space could be changed by other bind jobs executing before this one,
which would make the calculated split/merge steps obsolete and wrong.



Hmm, maybe I'm not understanding this implementation, admittedly I
haven't studied the gpuva manager code in detail.


The limitation I mentioned above doesn't really come from the 
drm_gpuva_manager, but from how the driver 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Matthew Brost
On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote:
> On 1/19/23 05:58, Matthew Brost wrote:
> > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > > > 
> > > > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > > > This commit provides the implementation for the new uapi motivated by 
> > > > > the
> > > > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > > > 
> > > > > 1) Initialize a GPU virtual address (VA) space via the new
> > > > >      DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion 
> > > > > of VA
> > > > >      space managed by the kernel and userspace, respectively.
> > > > > 
> > > > > 2) Allocate and free a VA space region as well as bind and unbind 
> > > > > memory
> > > > >      to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > > > >      UMDs can request the named operations to be processed either
> > > > >      synchronously or asynchronously. It supports DRM syncobjs
> > > > >      (incl. timelines) as synchronization mechanism. The management 
> > > > > of the
> > > > >      GPU VA mappings is implemented with the DRM GPU VA manager.
> > > > > 
> > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
> > > > >      execution happens asynchronously. It supports DRM syncobj (incl.
> > > > >      timelines) as synchronization mechanism. DRM GEM object locking 
> > > > > is
> > > > >      handled with drm_exec.
> > > > > 
> > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the 
> > > > > DRM
> > > > > GPU scheduler for the asynchronous paths.
> > > > > 
> > > > > Signed-off-by: Danilo Krummrich 
> > > > > ---
> > > > >    Documentation/gpu/driver-uapi.rst   |   3 +
> > > > >    drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > > > >    drivers/gpu/drm/nouveau/Kconfig |   2 +
> > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > > > >    drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > > > >    drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > > > >    drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > > > >    drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > > > >    drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > > > >    drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > > > > 
> > > > >    drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > > > >    11 files changed, 1295 insertions(+), 4 deletions(-)
> > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > >    create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > ...
> > > > > 
> > > > > +static struct dma_fence *
> > > > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > > > +{
> > > > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > > > +    struct bind_job_op *op;
> > > > > +    int ret = 0;
> > > > > +
> > > > 
> > > > I was looking at how nouveau does the async binding compared to how xe
> > > > does it.
> > > > It looks to me that this function being a scheduler run_job callback is
> > > > the main part of the VM_BIND dma-fence signalling critical section for
> > > > the job's done_fence and if so, needs to be annotated as such?
> > > 
> > > Yes, that's the case.
> > > 
> > > > 
> > > > For example nouveau_uvma_region_new allocates memory, which is not
> > > > allowed if in a dma_fence signalling critical section and the locking
> > > > also looks suspicious?
> > > 
> > > Thanks for pointing this out, I missed that somehow.
> > > 
> > > I will change it to pre-allocate new regions, mappings and page tables
> > > within the job's submit() function.
> > > 
> > 
> > Yea that what we basically do in Xe, in the IOCTL step allocate all the
> > backing store for new page tables, populate new page tables (these are
> > not yet visible in the page table structure), and in last step which is
> > executed after all the dependencies are satified program all the leaf
> > entires making the new binding visible.
> > 
> > We screwed have this up by defering most of the IOCTL to a worker but
> > will fix this fix this one way or another soon - get rid of worker or
> > introduce a type of sync that is signaled after the worker + publish the
> > dma-fence in the worker. I'd like to close on this one soon.
> > > For the ops structures the drm_gpuva_manager allocates for reporting the
> > > split/merge steps back to the driver I have ideas to entirely avoid
> > > allocations, which also is a good thing in respect of Christians feedback
> > > regarding the huge amount of mapping requests some applications seem to
> > > generate.
> > > 
> > 
> > It should be fine to have allocations to report the split/merge step as
> > this 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-19 Thread Danilo Krummrich

On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:


On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
     space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
     UMDs can request the named operations to be processed either
     synchronously or asynchronously. It supports DRM syncobjs
     (incl. timelines) as synchronization mechanism. The management of the
     GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
     execution happens asynchronously. It supports DRM syncobj (incl.
     timelines) as synchronization mechanism. DRM GEM object locking is
     handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
   Documentation/gpu/driver-uapi.rst   |   3 +
   drivers/gpu/drm/nouveau/Kbuild  |   2 +
   drivers/gpu/drm/nouveau/Kconfig |   2 +
   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
   11 files changed, 1295 insertions(+), 4 deletions(-)
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?


Yes, that's the case.



For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?


Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.



Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
  

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.



It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.


I think we can't really ask for the split/merge steps before actually 
running the job, since it requires the particular VA space not to change 
while performing those operations.


E.g. if we'd run the split/merge steps at job submit() time the 
underlying VA space could be changed by other bind jobs executing before 
this one, which would make the calculated split/merge steps obsolete and 
wrong.


Anyway, I should be able to get rid of all the allocations to make this 
safe.




Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Intel



On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:

On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
     space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
     UMDs can request the named operations to be processed either
     synchronously or asynchronously. It supports DRM syncobjs
     (incl. timelines) as synchronization mechanism. The management of the
     GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
     execution happens asynchronously. It supports DRM syncobj (incl.
     timelines) as synchronization mechanism. DRM GEM object locking is
     handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
   Documentation/gpu/driver-uapi.rst   |   3 +
   drivers/gpu/drm/nouveau/Kbuild  |   2 +
   drivers/gpu/drm/nouveau/Kconfig |   2 +
   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
   11 files changed, 1295 insertions(+), 4 deletions(-)
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...

+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+

I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?

Yes, that's the case.


For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?

Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.


Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
  

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.


It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.

Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream sooner.
  

Regarding the locking, anything specific that makes it look suspicious to
you?


I haven't looked into this too but almost certainly Thomas is suggesting
that if you allocate memory anywhere under the nouveau_uvmm_lock then
you can't use this lock in the run_job() callback as this in the
dma-fencing path.


Yes, that was what looked suspicious to me, although I haven't either 
looked at the code in detail to say for sure.


But starting by annotating this with dma_fence_[begin | 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Matthew Brost
On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:
> On 1/18/23 21:37, Thomas Hellström (Intel) wrote:
> > 
> > On 1/18/23 07:12, Danilo Krummrich wrote:
> > > This commit provides the implementation for the new uapi motivated by the
> > > Vulkan API. It allows user mode drivers (UMDs) to:
> > > 
> > > 1) Initialize a GPU virtual address (VA) space via the new
> > >     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
> > >     space managed by the kernel and userspace, respectively.
> > > 
> > > 2) Allocate and free a VA space region as well as bind and unbind memory
> > >     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > >     UMDs can request the named operations to be processed either
> > >     synchronously or asynchronously. It supports DRM syncobjs
> > >     (incl. timelines) as synchronization mechanism. The management of the
> > >     GPU VA mappings is implemented with the DRM GPU VA manager.
> > > 
> > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
> > >     execution happens asynchronously. It supports DRM syncobj (incl.
> > >     timelines) as synchronization mechanism. DRM GEM object locking is
> > >     handled with drm_exec.
> > > 
> > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
> > > GPU scheduler for the asynchronous paths.
> > > 
> > > Signed-off-by: Danilo Krummrich 
> > > ---
> > >   Documentation/gpu/driver-uapi.rst   |   3 +
> > >   drivers/gpu/drm/nouveau/Kbuild  |   2 +
> > >   drivers/gpu/drm/nouveau/Kconfig |   2 +
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
> > >   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
> > >   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
> > >   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
> > >   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
> > >   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
> > >   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
> > >   11 files changed, 1295 insertions(+), 4 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> > ...
> > > 
> > > +static struct dma_fence *
> > > +nouveau_bind_job_run(struct nouveau_job *job)
> > > +{
> > > +    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
> > > +    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
> > > +    struct bind_job_op *op;
> > > +    int ret = 0;
> > > +
> > 
> > I was looking at how nouveau does the async binding compared to how xe
> > does it.
> > It looks to me that this function being a scheduler run_job callback is
> > the main part of the VM_BIND dma-fence signalling critical section for
> > the job's done_fence and if so, needs to be annotated as such?
> 
> Yes, that's the case.
> 
> > 
> > For example nouveau_uvma_region_new allocates memory, which is not
> > allowed if in a dma_fence signalling critical section and the locking
> > also looks suspicious?
> 
> Thanks for pointing this out, I missed that somehow.
> 
> I will change it to pre-allocate new regions, mappings and page tables
> within the job's submit() function.
>

Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
 
> For the ops structures the drm_gpuva_manager allocates for reporting the
> split/merge steps back to the driver I have ideas to entirely avoid
> allocations, which also is a good thing in respect of Christians feedback
> regarding the huge amount of mapping requests some applications seem to
> generate.
>

It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.

Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream sooner.
 
> Regarding the locking, anything specific that makes it look suspicious to
> you?
> 

I haven't looked into this too but almost certainly Thomas is suggesting
that if you allocate memory 

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Danilo Krummrich

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:


On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
    DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
    space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
    to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
    UMDs can request the named operations to be processed either
    synchronously or asynchronously. It supports DRM syncobjs
    (incl. timelines) as synchronization mechanism. The management of the
    GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
    execution happens asynchronously. It supports DRM syncobj (incl.
    timelines) as synchronization mechanism. DRM GEM object locking is
    handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
  Documentation/gpu/driver-uapi.rst   |   3 +
  drivers/gpu/drm/nouveau/Kbuild  |   2 +
  drivers/gpu/drm/nouveau/Kconfig |   2 +
  drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
  drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
  drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
  drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
  drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
  drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
  drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
  11 files changed, 1295 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe 
does it.
It looks to me that this function being a scheduler run_job callback is 
the main part of the VM_BIND dma-fence signalling critical section for 
the job's done_fence and if so, needs to be annotated as such?


Yes, that's the case.



For example nouveau_uvma_region_new allocates memory, which is not 
allowed if in a dma_fence signalling critical section and the locking 
also looks suspicious?


Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables 
within the job's submit() function.


For the ops structures the drm_gpuva_manager allocates for reporting the 
split/merge steps back to the driver I have ideas to entirely avoid 
allocations, which also is a good thing in respect of Christians 
feedback regarding the huge amount of mapping requests some applications 
seem to generate.


Regarding the locking, anything specific that makes it look suspicious 
to you?




Thanks,

Thomas



+    nouveau_uvmm_lock(uvmm);
+    list_for_each_op(op, _job->ops) {
+    switch (op->op) {
+    case OP_ALLOC: {
+    bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+    ret = nouveau_uvma_region_new(uvmm,
+  op->va.addr,
+  op->va.range,
+  sparse);
+    if (ret)
+    goto out_unlock;
+    break;
+    }
+    case OP_FREE:
+    ret = nouveau_uvma_region_destroy(uvmm,
+  op->va.addr,
+  op->va.range);
+    if (ret)
+    goto out_unlock;
+    break;
+    case OP_MAP:
+    ret = nouveau_uvmm_sm_map(uvmm,
+  op->va.addr, op->va.range,
+  op->gem.obj, op->gem.offset,
+  op->flags && 0xff);
+    if (ret)
+    goto out_unlock;
+    break;
+    case OP_UNMAP:
+    ret = nouveau_uvmm_sm_unmap(uvmm,
+    op->va.addr,
+    op->va.range);
+    if (ret)
+    goto out_unlock;
+    break;
+    }
+    }
+
+out_unlock:
+    nouveau_uvmm_unlock(uvmm);
+    if (ret)
+    NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+    return ERR_PTR(ret);
+}
+
+static void
+nouveau_bind_job_free(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct bind_job_op *op, *next;
+
+    

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Intel



On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
UMDs can request the named operations to be processed either
synchronously or asynchronously. It supports DRM syncobjs
(incl. timelines) as synchronization mechanism. The management of the
GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
execution happens asynchronously. It supports DRM syncobj (incl.
timelines) as synchronization mechanism. DRM GEM object locking is
handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
  Documentation/gpu/driver-uapi.rst   |   3 +
  drivers/gpu/drm/nouveau/Kbuild  |   2 +
  drivers/gpu/drm/nouveau/Kconfig |   2 +
  drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
  drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
  drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
  drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
  drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
  drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
  drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
  11 files changed, 1295 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+   struct bind_job_op *op;
+   int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe 
does it.
It looks to me that this function being a scheduler run_job callback is 
the main part of the VM_BIND dma-fence signalling critical section for 
the job's done_fence and if so, needs to be annotated as such?


For example nouveau_uvma_region_new allocates memory, which is not 
allowed if in a dma_fence signalling critical section and the locking 
also looks suspicious?


Thanks,

Thomas



+   nouveau_uvmm_lock(uvmm);
+   list_for_each_op(op, _job->ops) {
+   switch (op->op) {
+   case OP_ALLOC: {
+   bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+   ret = nouveau_uvma_region_new(uvmm,
+ op->va.addr,
+ op->va.range,
+ sparse);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   case OP_FREE:
+   ret = nouveau_uvma_region_destroy(uvmm,
+ op->va.addr,
+ op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_MAP:
+   ret = nouveau_uvmm_sm_map(uvmm,
+ op->va.addr, op->va.range,
+ op->gem.obj, op->gem.offset,
+ op->flags && 0xff);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_UNMAP:
+   ret = nouveau_uvmm_sm_unmap(uvmm,
+   op->va.addr,
+   op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   }
+
+out_unlock:
+   nouveau_uvmm_unlock(uvmm);
+   if (ret)
+   NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+   return ERR_PTR(ret);
+}
+
+static void
+nouveau_bind_job_free(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct bind_job_op *op, *next;
+
+   list_for_each_op_safe(op, next, _job->ops) {
+   struct drm_gem_object *obj =