Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-18 Thread Niranjana Vishwanathapura

On Tue, Dec 17, 2019 at 12:01:26PM -0600, Jason Ekstrand wrote:

  On Sun, Dec 15, 2019 at 10:24 PM Niranjan Vishwanathapura
   wrote:

On Sat, Dec 14, 2019 at 10:31:37AM +, Chris Wilson wrote:
>Quoting Jason Ekstrand (2019-12-14 00:36:19)
>> On Fri, Dec 13, 2019 at 5:24 PM Niranjan Vishwanathapura <
>> niranjana.vishwanathap...@intel.com> wrote:
>>
>> On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:
>> >
>> > +/**
>> > + * struct drm_i915_gem_vm_bind
>> > + *
>> > + * Bind an object in a vm's page table.
>> >
>> >   First off, this is something I've wanted for a while for
Vulkan, it's
>> just
>> >   never made its way high enough up the priority list. 
However, it's

>> going
>> >   to have to come one way or another soon.  I'm glad to see
kernel API
>> for
>> >   this being proposed.
>> >   I do, however, have a few high-level comments/questions about
the API:
>> >1. In order to be useful for sparse memory support, the API
has to go
>> the
>> >   other way around so that it binds a VA range to a range
within the BO. 
>> It

>> >   also needs to be able to handle overlapping where two
different VA
>> ranges
>> >   may map to the same underlying bytes in the BO.  This likely
means that
>> >   unbind needs to also take a VA range and only unbind that
range.
>> >2. If this is going to be useful for managing GL's address
space where
>> we
>> >   have lots of BOs, we probably want it to take a list of
ranges so we
>> >   aren't making one ioctl for each thing we want to bind.
>>
>> Hi Jason,
>>
>> Yah, some of these requirements came up.
>>
>>  
>> Yes, I have raised them every single time an API like this has come

across my
>> e-mail inbox for years and they continue to get ignored.  Why are we
landing an
>> API that we know isn't the API we want especially when it's pretty
obvious
>> roughly what the API we want is?  It may be less time in the short
term, but
>> long-term it means two ioctls and two implementations in i915, IGT
tests for
>> both code paths, and code in all UMDs to call one or the other
depending on
>> what kernel you're running on, and we have to maintain all that code
going
>> forward forever.  Sure, that's a price we pay today for a variety of
things but
>> that's because they all seemed like the right thing at the time. 
Landing the

>> wrong API when we know it's the wrong API seems foolish.
>
>Exactly. This is not even close to the uAPI we need. Reposting an RFC
>without taking in the concerns last time (or the time before that, or
>the time before that...) suggests that you aren't really requesting for
>comments at all.

Thanks Jason for detailed exlanation.
Chris, all comments and guidance are much appreciated :)

I haven't looked in detail, but my concern is that implementing
partial object binding (offset, lenght) from vma down to [un]binding
in ppgtt might be a lot of work to include in this SVM patch series.
I believe we need the partial object binding in non-SVM scenario
as well?

  Yes, the Vulkan APIs require both partial binding and aliasing.
  It may be worth pointing out that we're already doing some of this stuff
  today, although in a rather backwards way.  Our non-softpin model for
  Vulkan uses a memfd which we then map in userspace and turn into a BO via
  userptr.  Due to the way we handle threading in the driver, we end up with
  multiple BOs pointing to the same overlapping range in the memfd and hence
  the same pages.  That doesn't mean that everything in the path is already
  set up for what you need but the VA -> page mappings should be.  Also,
  avoiding these kinds of shinanigans is exactly why we want a "real" kernel
  API for this. :-)
   


Ok thanks Jason for the explantion.
Will look into supporting this here.



Ok, let me change the interface as below.

struct drm_i915_gem_vm_bind_va
{
/** VA start to bind **/
__u64 start;

/** Offset in Object to bind for I915_GEM_VM_BIND_SVM_OBJ type
**/
__u64 offset;

/** VA length to [un]bind **/
__u64 length;

/** Type of memory to [un]bind **/
__u32 type;
#define I915_GEM_VM_BIND_SVM_OBJ  0
#define I915_GEM_VM_BIND_SVM_BUFFER   1

/** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ type
**/
__u32 handle;

/** Flags **/
__u32 flags;
#define I915_GEM_VM_BIND_UNBIND  (1 << 0)
#define I915_GEM_VM_BIND_READONLY(1 << 1)
}

struct drm_i915_gem_vm_

Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-18 Thread Niranjana Vishwanathapura

On Sun, Dec 15, 2019 at 08:15:24PM -0800, Niranjan Vishwanathapura wrote:

On Sat, Dec 14, 2019 at 10:56:54AM +, Chris Wilson wrote:

Quoting Niranjana Vishwanathapura (2019-12-13 21:56:04)

Shared Virtual Memory (SVM) runtime allocator support allows
binding a shared virtual address to a buffer object (BO) in the
device page table through an ioctl call.

Cc: Joonas Lahtinen 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Sudeep Dutt 
Signed-off-by: Niranjana Vishwanathapura 
---
drivers/gpu/drm/i915/Kconfig  | 11 
drivers/gpu/drm/i915/Makefile |  3 +
.../gpu/drm/i915/gem/i915_gem_execbuffer.c| 58 ++
drivers/gpu/drm/i915/gem/i915_gem_svm.c   | 60 +++
drivers/gpu/drm/i915/gem/i915_gem_svm.h   | 22 +++
drivers/gpu/drm/i915/i915_drv.c   | 21 +++
drivers/gpu/drm/i915/i915_drv.h   | 22 +++
drivers/gpu/drm/i915/i915_gem_gtt.c   |  1 +
drivers/gpu/drm/i915/i915_gem_gtt.h   | 13 
include/uapi/drm/i915_drm.h   | 27 +
10 files changed, 227 insertions(+), 11 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.c
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.h

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index ba9595960bbe..c2e48710eec8 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -137,6 +137,16 @@ config DRM_I915_GVT_KVMGT
 Choose this option if you want to enable KVMGT support for
 Intel GVT-g.

+config DRM_I915_SVM
+   bool "Enable Shared Virtual Memory support in i915"
+   depends on STAGING
+   depends on DRM_I915
+   default n
+   help
+ Choose this option if you want Shared Virtual Memory (SVM)
+ support in i915. With SVM support, one can share the virtual
+ address space between a process and the GPU.
+
menu "drm/i915 Debugging"
depends on DRM_I915
depends on EXPERT
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e0fd10c0cfb8..75fe45633779 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -153,6 +153,9 @@ i915-y += \
 intel_region_lmem.o \
 intel_wopcm.o

+# SVM code
+i915-$(CONFIG_DRM_I915_SVM) += gem/i915_gem_svm.o
+
# general-purpose microcontroller (GuC) support
obj-y += gt/uc/
i915-y += gt/uc/intel_uc.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5003e616a1ad..af360238a392 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2836,10 +2836,14 @@ int
i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file)
{
+   struct drm_i915_gem_exec_object2 *exec2_list, *exec2_list_user;
   struct drm_i915_gem_execbuffer2 *args = data;
-   struct drm_i915_gem_exec_object2 *exec2_list;
-   struct drm_syncobj **fences = NULL;
   const size_t count = args->buffer_count;
+   struct drm_syncobj **fences = NULL;
+   unsigned int i = 0, svm_count = 0;
+   struct i915_address_space *vm;
+   struct i915_gem_context *ctx;
+   struct i915_svm_obj *svm_obj;
   int err;

   if (!check_buffer_count(count)) {
@@ -2851,15 +2855,46 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void 
*data,
   if (err)
   return err;

+   ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
+   if (!ctx || !rcu_access_pointer(ctx->vm))
+   return -ENOENT;


This is just hopelessly wrong.

For persistence, the _ce_->vm will have a list of must-be-present
vma, with a flag for whether they need prefaulting (!svm everything must
be prefaulted obviously). Then during reservation we ensure that all those
persistent vma are in place (so we probably use an eviction list to keep
track of those we need to instantiate on this execbuf). We don't even
want to individually track activity on those vma, preferring to assume
they are used by every request and so on change they need serialising
[for explicit uAPI unbind, where possible we strive to do it async for
endless, or at least sync against iova semaphore] against the last request
in the vm (so we need a vm->active). However, we do need an EXT_EXTENSION
to mark writes for implicit fencing (e.g.  exported dmabuf) to replace
the information lost from execobject[]



I did not understand some points above.
I am no expert here, and appreciate the feedback.
My understanding is that [excluding endless batch buffer scenario which
is not supported in this patch series,] VM_BIND is no different than the
soft-pinning of objects we have today in the execbuf path. Hence the idea
here is to add those VM_BIND objects to the execobject[] and let the
execbuffer path to take care of the rest. Persistence of bindings across
multiple requests is something no

Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-17 Thread Jason Ekstrand
On Sun, Dec 15, 2019 at 10:24 PM Niranjan Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Sat, Dec 14, 2019 at 10:31:37AM +, Chris Wilson wrote:
> >Quoting Jason Ekstrand (2019-12-14 00:36:19)
> >> On Fri, Dec 13, 2019 at 5:24 PM Niranjan Vishwanathapura <
> >> niranjana.vishwanathap...@intel.com> wrote:
> >>
> >> On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:
> >> >
> >> > +/**
> >> > + * struct drm_i915_gem_vm_bind
> >> > + *
> >> > + * Bind an object in a vm's page table.
> >> >
> >> >   First off, this is something I've wanted for a while for
> Vulkan, it's
> >> just
> >> >   never made its way high enough up the priority list.  However,
> it's
> >> going
> >> >   to have to come one way or another soon.  I'm glad to see
> kernel API
> >> for
> >> >   this being proposed.
> >> >   I do, however, have a few high-level comments/questions about
> the API:
> >> >1. In order to be useful for sparse memory support, the API
> has to go
> >> the
> >> >   other way around so that it binds a VA range to a range within
> the BO.
> >> It
> >> >   also needs to be able to handle overlapping where two different
> VA
> >> ranges
> >> >   may map to the same underlying bytes in the BO.  This likely
> means that
> >> >   unbind needs to also take a VA range and only unbind that range.
> >> >2. If this is going to be useful for managing GL's address
> space where
> >> we
> >> >   have lots of BOs, we probably want it to take a list of ranges
> so we
> >> >   aren't making one ioctl for each thing we want to bind.
> >>
> >> Hi Jason,
> >>
> >> Yah, some of these requirements came up.
> >>
> >>
> >> Yes, I have raised them every single time an API like this has come
> across my
> >> e-mail inbox for years and they continue to get ignored.  Why are we
> landing an
> >> API that we know isn't the API we want especially when it's pretty
> obvious
> >> roughly what the API we want is?  It may be less time in the short
> term, but
> >> long-term it means two ioctls and two implementations in i915, IGT
> tests for
> >> both code paths, and code in all UMDs to call one or the other
> depending on
> >> what kernel you're running on, and we have to maintain all that code
> going
> >> forward forever.  Sure, that's a price we pay today for a variety of
> things but
> >> that's because they all seemed like the right thing at the time.
> Landing the
> >> wrong API when we know it's the wrong API seems foolish.
> >
> >Exactly. This is not even close to the uAPI we need. Reposting an RFC
> >without taking in the concerns last time (or the time before that, or
> >the time before that...) suggests that you aren't really requesting for
> >comments at all.
>
> Thanks Jason for detailed exlanation.
> Chris, all comments and guidance are much appreciated :)
>
> I haven't looked in detail, but my concern is that implementing
> partial object binding (offset, lenght) from vma down to [un]binding
> in ppgtt might be a lot of work to include in this SVM patch series.
> I believe we need the partial object binding in non-SVM scenario
> as well?
>

Yes, the Vulkan APIs require both partial binding and aliasing.

It may be worth pointing out that we're already doing some of this stuff
today, although in a rather backwards way.  Our non-softpin model for
Vulkan uses a memfd which we then map in userspace and turn into a BO via
userptr.  Due to the way we handle threading in the driver, we end up with
multiple BOs pointing to the same overlapping range in the memfd and hence
the same pages.  That doesn't mean that everything in the path is already
set up for what you need but the VA -> page mappings should be.  Also,
avoiding these kinds of shinanigans is exactly why we want a "real" kernel
API for this. :-)


> Ok, let me change the interface as below.
>
> struct drm_i915_gem_vm_bind_va
> {
> /** VA start to bind **/
> __u64 start;
>
> /** Offset in Object to bind for I915_GEM_VM_BIND_SVM_OBJ type **/
> __u64 offset;
>
> /** VA length to [un]bind **/
> __u64 length;
>
> /** Type of memory to [un]bind **/
> __u32 type;
> #define I915_GEM_VM_BIND_SVM_OBJ  0
> #define I915_GEM_VM_BIND_SVM_BUFFER   1
>
> /** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ type **/
> __u32 handle;
>
> /** Flags **/
> __u32 flags;
> #define I915_GEM_VM_BIND_UNBIND  (1 << 0)
> #define I915_GEM_VM_BIND_READONLY(1 << 1)
> }
>
> struct drm_i915_gem_vm_bind {
> /** vm to [un]bind **/
> __u32 vm_id;
>
> /** number of VAs to bind **/
> __u32 num_vas;
>
> /** Array of VAs to bind **/
> struct drm_i915_gem_vm_bind_va *bind_vas;
>
> /** User extensions **/
> __u64 extensions;
> };
>
> When synchronization control is added as ext

Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-15 Thread Niranjan Vishwanathapura

On Sat, Dec 14, 2019 at 10:56:54AM +, Chris Wilson wrote:

Quoting Niranjana Vishwanathapura (2019-12-13 21:56:04)

Shared Virtual Memory (SVM) runtime allocator support allows
binding a shared virtual address to a buffer object (BO) in the
device page table through an ioctl call.

Cc: Joonas Lahtinen 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Sudeep Dutt 
Signed-off-by: Niranjana Vishwanathapura 
---
 drivers/gpu/drm/i915/Kconfig  | 11 
 drivers/gpu/drm/i915/Makefile |  3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 58 ++
 drivers/gpu/drm/i915/gem/i915_gem_svm.c   | 60 +++
 drivers/gpu/drm/i915/gem/i915_gem_svm.h   | 22 +++
 drivers/gpu/drm/i915/i915_drv.c   | 21 +++
 drivers/gpu/drm/i915/i915_drv.h   | 22 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  1 +
 drivers/gpu/drm/i915/i915_gem_gtt.h   | 13 
 include/uapi/drm/i915_drm.h   | 27 +
 10 files changed, 227 insertions(+), 11 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.h

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index ba9595960bbe..c2e48710eec8 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -137,6 +137,16 @@ config DRM_I915_GVT_KVMGT
  Choose this option if you want to enable KVMGT support for
  Intel GVT-g.

+config DRM_I915_SVM
+   bool "Enable Shared Virtual Memory support in i915"
+   depends on STAGING
+   depends on DRM_I915
+   default n
+   help
+ Choose this option if you want Shared Virtual Memory (SVM)
+ support in i915. With SVM support, one can share the virtual
+ address space between a process and the GPU.
+
 menu "drm/i915 Debugging"
 depends on DRM_I915
 depends on EXPERT
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e0fd10c0cfb8..75fe45633779 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -153,6 +153,9 @@ i915-y += \
  intel_region_lmem.o \
  intel_wopcm.o

+# SVM code
+i915-$(CONFIG_DRM_I915_SVM) += gem/i915_gem_svm.o
+
 # general-purpose microcontroller (GuC) support
 obj-y += gt/uc/
 i915-y += gt/uc/intel_uc.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5003e616a1ad..af360238a392 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2836,10 +2836,14 @@ int
 i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file)
 {
+   struct drm_i915_gem_exec_object2 *exec2_list, *exec2_list_user;
struct drm_i915_gem_execbuffer2 *args = data;
-   struct drm_i915_gem_exec_object2 *exec2_list;
-   struct drm_syncobj **fences = NULL;
const size_t count = args->buffer_count;
+   struct drm_syncobj **fences = NULL;
+   unsigned int i = 0, svm_count = 0;
+   struct i915_address_space *vm;
+   struct i915_gem_context *ctx;
+   struct i915_svm_obj *svm_obj;
int err;

if (!check_buffer_count(count)) {
@@ -2851,15 +2855,46 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void 
*data,
if (err)
return err;

+   ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
+   if (!ctx || !rcu_access_pointer(ctx->vm))
+   return -ENOENT;


This is just hopelessly wrong.

For persistence, the _ce_->vm will have a list of must-be-present
vma, with a flag for whether they need prefaulting (!svm everything must
be prefaulted obviously). Then during reservation we ensure that all those
persistent vma are in place (so we probably use an eviction list to keep
track of those we need to instantiate on this execbuf). We don't even
want to individually track activity on those vma, preferring to assume
they are used by every request and so on change they need serialising
[for explicit uAPI unbind, where possible we strive to do it async for
endless, or at least sync against iova semaphore] against the last request
in the vm (so we need a vm->active). However, we do need an EXT_EXTENSION
to mark writes for implicit fencing (e.g.  exported dmabuf) to replace
the information lost from execobject[]



I did not understand some points above.
I am no expert here, and appreciate the feedback.
My understanding is that [excluding endless batch buffer scenario which
is not supported in this patch series,] VM_BIND is no different than the
soft-pinning of objects we have today in the execbuf path. Hence the idea
here is to add those VM_BIND objects to the execobject[] and let the
execbuffer path to take care of the rest. Persistence of bindings across
multiple requests is something not considered. Do we need this flag in
execo

Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-15 Thread Niranjan Vishwanathapura

On Sat, Dec 14, 2019 at 10:31:37AM +, Chris Wilson wrote:

Quoting Jason Ekstrand (2019-12-14 00:36:19)

On Fri, Dec 13, 2019 at 5:24 PM Niranjan Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:
>
>     +/**
>     + * struct drm_i915_gem_vm_bind
>     + *
>     + * Bind an object in a vm's page table.
>
>   First off, this is something I've wanted for a while for Vulkan, it's
just
>   never made its way high enough up the priority list.  However, it's
going
>   to have to come one way or another soon.  I'm glad to see kernel API
for
>   this being proposed.
>   I do, however, have a few high-level comments/questions about the API:
>    1. In order to be useful for sparse memory support, the API has to go
the
>   other way around so that it binds a VA range to a range within the BO. 
It
>   also needs to be able to handle overlapping where two different VA
ranges
>   may map to the same underlying bytes in the BO.  This likely means that
>   unbind needs to also take a VA range and only unbind that range.
>    2. If this is going to be useful for managing GL's address space where
we
>   have lots of BOs, we probably want it to take a list of ranges so we
>   aren't making one ioctl for each thing we want to bind.

Hi Jason,

Yah, some of these requirements came up.

 
Yes, I have raised them every single time an API like this has come across my
e-mail inbox for years and they continue to get ignored.  Why are we landing an
API that we know isn't the API we want especially when it's pretty obvious
roughly what the API we want is?  It may be less time in the short term, but
long-term it means two ioctls and two implementations in i915, IGT tests for
both code paths, and code in all UMDs to call one or the other depending on
what kernel you're running on, and we have to maintain all that code going
forward forever.  Sure, that's a price we pay today for a variety of things but
that's because they all seemed like the right thing at the time.  Landing the
wrong API when we know it's the wrong API seems foolish.


Exactly. This is not even close to the uAPI we need. Reposting an RFC
without taking in the concerns last time (or the time before that, or
the time before that...) suggests that you aren't really requesting for
comments at all.


Thanks Jason for detailed exlanation.
Chris, all comments and guidance are much appreciated :)

I haven't looked in detail, but my concern is that implementing
partial object binding (offset, lenght) from vma down to [un]binding
in ppgtt might be a lot of work to include in this SVM patch series.
I believe we need the partial object binding in non-SVM scenario
as well?

Ok, let me change the interface as below.

struct drm_i915_gem_vm_bind_va
{
   /** VA start to bind **/
   __u64 start;

   /** Offset in Object to bind for I915_GEM_VM_BIND_SVM_OBJ type **/
   __u64 offset;

   /** VA length to [un]bind **/
   __u64 length;

   /** Type of memory to [un]bind **/
   __u32 type;
#define I915_GEM_VM_BIND_SVM_OBJ  0
#define I915_GEM_VM_BIND_SVM_BUFFER   1

   /** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ type **/
   __u32 handle;

   /** Flags **/
   __u32 flags;
#define I915_GEM_VM_BIND_UNBIND  (1 << 0)
#define I915_GEM_VM_BIND_READONLY(1 << 1)
}

struct drm_i915_gem_vm_bind {
   /** vm to [un]bind **/
   __u32 vm_id;

/** number of VAs to bind **/
__u32 num_vas;

/** Array of VAs to bind **/
struct drm_i915_gem_vm_bind_va *bind_vas;

/** User extensions **/
   __u64 extensions;
};

When synchronization control is added as extension, it applies to all VAs in 
the array.
Does this looks good?

Niranjana


-Chris

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-14 Thread Chris Wilson
Quoting Niranjana Vishwanathapura (2019-12-13 21:56:04)
> Shared Virtual Memory (SVM) runtime allocator support allows
> binding a shared virtual address to a buffer object (BO) in the
> device page table through an ioctl call.
> 
> Cc: Joonas Lahtinen 
> Cc: Jon Bloomfield 
> Cc: Daniel Vetter 
> Cc: Sudeep Dutt 
> Signed-off-by: Niranjana Vishwanathapura 
> ---
>  drivers/gpu/drm/i915/Kconfig  | 11 
>  drivers/gpu/drm/i915/Makefile |  3 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 58 ++
>  drivers/gpu/drm/i915/gem/i915_gem_svm.c   | 60 +++
>  drivers/gpu/drm/i915/gem/i915_gem_svm.h   | 22 +++
>  drivers/gpu/drm/i915/i915_drv.c   | 21 +++
>  drivers/gpu/drm/i915/i915_drv.h   | 22 +++
>  drivers/gpu/drm/i915/i915_gem_gtt.c   |  1 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h   | 13 
>  include/uapi/drm/i915_drm.h   | 27 +
>  10 files changed, 227 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.c
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.h
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index ba9595960bbe..c2e48710eec8 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -137,6 +137,16 @@ config DRM_I915_GVT_KVMGT
>   Choose this option if you want to enable KVMGT support for
>   Intel GVT-g.
>  
> +config DRM_I915_SVM
> +   bool "Enable Shared Virtual Memory support in i915"
> +   depends on STAGING
> +   depends on DRM_I915
> +   default n
> +   help
> + Choose this option if you want Shared Virtual Memory (SVM)
> + support in i915. With SVM support, one can share the virtual
> + address space between a process and the GPU.
> +
>  menu "drm/i915 Debugging"
>  depends on DRM_I915
>  depends on EXPERT
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index e0fd10c0cfb8..75fe45633779 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -153,6 +153,9 @@ i915-y += \
>   intel_region_lmem.o \
>   intel_wopcm.o
>  
> +# SVM code
> +i915-$(CONFIG_DRM_I915_SVM) += gem/i915_gem_svm.o
> +
>  # general-purpose microcontroller (GuC) support
>  obj-y += gt/uc/
>  i915-y += gt/uc/intel_uc.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5003e616a1ad..af360238a392 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2836,10 +2836,14 @@ int
>  i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>struct drm_file *file)
>  {
> +   struct drm_i915_gem_exec_object2 *exec2_list, *exec2_list_user;
> struct drm_i915_gem_execbuffer2 *args = data;
> -   struct drm_i915_gem_exec_object2 *exec2_list;
> -   struct drm_syncobj **fences = NULL;
> const size_t count = args->buffer_count;
> +   struct drm_syncobj **fences = NULL;
> +   unsigned int i = 0, svm_count = 0;
> +   struct i915_address_space *vm;
> +   struct i915_gem_context *ctx;
> +   struct i915_svm_obj *svm_obj;
> int err;
>  
> if (!check_buffer_count(count)) {
> @@ -2851,15 +2855,46 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, 
> void *data,
> if (err)
> return err;
>  
> +   ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
> +   if (!ctx || !rcu_access_pointer(ctx->vm))
> +   return -ENOENT;

This is just hopelessly wrong.

For persistence, the _ce_->vm will have a list of must-be-present
vma, with a flag for whether they need prefaulting (!svm everything must
be prefaulted obviously). Then during reservation we ensure that all those
persistent vma are in place (so we probably use an eviction list to keep
track of those we need to instantiate on this execbuf). We don't even
want to individually track activity on those vma, preferring to assume
they are used by every request and so on change they need serialising
[for explicit uAPI unbind, where possible we strive to do it async for
endless, or at least sync against iova semaphore] against the last request
in the vm (so we need a vm->active). However, we do need an EXT_EXTENSION
to mark writes for implicit fencing (e.g.  exported dmabuf) to replace
the information lost from execobject[]

> +struct drm_i915_gem_vm_bind {
> +   /** VA start to bind **/
> +   __u64 start;

iova;
offset; /* into handle */
length; /* from offset */

> +
> +   /** Type of memory to [un]bind **/
> +   __u32 type;
> +#define I915_GEM_VM_BIND_SVM_OBJ  0
> +
> +   /** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ type **/
> +   __u32 handle;
> +
> +   /** vm to [un]bind **/
> +  

Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-14 Thread Chris Wilson
Quoting Jason Ekstrand (2019-12-14 00:36:19)
> On Fri, Dec 13, 2019 at 5:24 PM Niranjan Vishwanathapura <
> niranjana.vishwanathap...@intel.com> wrote:
> 
> On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:
> >
> >     +/**
> >     + * struct drm_i915_gem_vm_bind
> >     + *
> >     + * Bind an object in a vm's page table.
> >
> >   First off, this is something I've wanted for a while for Vulkan, it's
> just
> >   never made its way high enough up the priority list.  However, it's
> going
> >   to have to come one way or another soon.  I'm glad to see kernel API
> for
> >   this being proposed.
> >   I do, however, have a few high-level comments/questions about the API:
> >    1. In order to be useful for sparse memory support, the API has to go
> the
> >   other way around so that it binds a VA range to a range within the 
> BO. 
> It
> >   also needs to be able to handle overlapping where two different VA
> ranges
> >   may map to the same underlying bytes in the BO.  This likely means 
> that
> >   unbind needs to also take a VA range and only unbind that range.
> >    2. If this is going to be useful for managing GL's address space 
> where
> we
> >   have lots of BOs, we probably want it to take a list of ranges so we
> >   aren't making one ioctl for each thing we want to bind.
> 
> Hi Jason,
> 
> Yah, some of these requirements came up.
> 
>  
> Yes, I have raised them every single time an API like this has come across my
> e-mail inbox for years and they continue to get ignored.  Why are we landing 
> an
> API that we know isn't the API we want especially when it's pretty obvious
> roughly what the API we want is?  It may be less time in the short term, but
> long-term it means two ioctls and two implementations in i915, IGT tests for
> both code paths, and code in all UMDs to call one or the other depending on
> what kernel you're running on, and we have to maintain all that code going
> forward forever.  Sure, that's a price we pay today for a variety of things 
> but
> that's because they all seemed like the right thing at the time.  Landing the
> wrong API when we know it's the wrong API seems foolish.

Exactly. This is not even close to the uAPI we need. Reposting an RFC
without taking in the concerns last time (or the time before that, or
the time before that...) suggests that you aren't really requesting for
comments at all.
-Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-13 Thread Jason Ekstrand
On Fri, Dec 13, 2019 at 5:24 PM Niranjan Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:
> >
> > +/**
> > + * struct drm_i915_gem_vm_bind
> > + *
> > + * Bind an object in a vm's page table.
> >
> >   First off, this is something I've wanted for a while for Vulkan, it's
> just
> >   never made its way high enough up the priority list.  However, it's
> going
> >   to have to come one way or another soon.  I'm glad to see kernel API
> for
> >   this being proposed.
> >   I do, however, have a few high-level comments/questions about the API:
> >1. In order to be useful for sparse memory support, the API has to go
> the
> >   other way around so that it binds a VA range to a range within the
> BO.  It
> >   also needs to be able to handle overlapping where two different VA
> ranges
> >   may map to the same underlying bytes in the BO.  This likely means that
> >   unbind needs to also take a VA range and only unbind that range.
> >2. If this is going to be useful for managing GL's address space
> where we
> >   have lots of BOs, we probably want it to take a list of ranges so we
> >   aren't making one ioctl for each thing we want to bind.
>
> Hi Jason,
>
> Yah, some of these requirements came up.
>

Yes, I have raised them every single time an API like this has come across
my e-mail inbox for years and they continue to get ignored.  Why are we
landing an API that we know isn't the API we want especially when it's
pretty obvious roughly what the API we want is?  It may be less time in the
short term, but long-term it means two ioctls and two implementations in
i915, IGT tests for both code paths, and code in all UMDs to call one or
the other depending on what kernel you're running on, and we have to
maintain all that code going forward forever.  Sure, that's a price we pay
today for a variety of things but that's because they all seemed like the
right thing at the time.  Landing the wrong API when we know it's the wrong
API seems foolish.

They are not being done here due to time and effort involved in defining
> those requirements, implementing and validating.
>

For #1, yes, it would require more effort but for #2, it really doesn't
take any extra effort to make it take an array...


> However, this ioctl can be extended in a backward compatible way to handle
> those requirements if required.
>
> >3. Why are there no ways to synchronize this with anything?  For
> binding,
> >   this probably isn't really needed as long as the VA range you're
> binding
> >   is empty.  However, if you want to move bindings around or unbind
> >   something, the only option is to block in userspace and then call
> >   bind/unbind.  This can be done but it means even more threads in the
> UMD
> >   which is unpleasant.  One could argue that that's more or less what the
> >   kernel is going to have to do so we may as well do it in userspace.
> >   However, I'm not 100% convinced that's true.
> >   --Jason
> >
>
> Yah, that is the thought.
> But as SVM feature evolves, I think we can consider handling some such
> cases
> if hadling those in driver does make whole lot sense.
>

Sparse binding exists as a feature.  It's been in D3D for some time and
it's in Vulkan.  We pretty much know what the requirements are.  If you go
look at how it's supposed to work in Vulkan, you have a binding queue and
it waits on semaphores before [un]binding and signals semaphores after
[un]binding.  The biggest problem from an API (as opposed to
implementation) POV with doing that in i915 is that we have too many
synchronization primitives to choose from. :-(

--Jason



> Thanks,
> Niranjana
>
> >
> > + */
> > +struct drm_i915_gem_vm_bind {
> > +   /** VA start to bind **/
> > +   __u64 start;
> > +
> > +   /** Type of memory to [un]bind **/
> > +   __u32 type;
> > +#define I915_GEM_VM_BIND_SVM_OBJ  0
> > +
> > +   /** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ
> type
> > **/
> > +   __u32 handle;
> > +
> > +   /** vm to [un]bind **/
> > +   __u32 vm_id;
> > +
> > +   /** Flags **/
> > +   __u32 flags;
> > +#define I915_GEM_VM_BIND_UNBIND  (1 << 0)
> > +#define I915_GEM_VM_BIND_READONLY(1 << 1)
> > +};
> > +
> >  #if defined(__cplusplus)
> >  }
> >  #endif
> > --
> > 2.21.0.rc0.32.g243a4c7e27
> >
> > ___
> > Intel-gfx mailing list
> > intel-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-13 Thread Niranjan Vishwanathapura

On Fri, Dec 13, 2019 at 04:58:42PM -0600, Jason Ekstrand wrote:


+/**
+ * struct drm_i915_gem_vm_bind
+ *
+ * Bind an object in a vm's page table.

  First off, this is something I've wanted for a while for Vulkan, it's just
  never made its way high enough up the priority list.  However, it's going
  to have to come one way or another soon.  I'm glad to see kernel API for
  this being proposed.
  I do, however, have a few high-level comments/questions about the API:
   1. In order to be useful for sparse memory support, the API has to go the
  other way around so that it binds a VA range to a range within the BO.  It
  also needs to be able to handle overlapping where two different VA ranges
  may map to the same underlying bytes in the BO.  This likely means that
  unbind needs to also take a VA range and only unbind that range.
   2. If this is going to be useful for managing GL's address space where we
  have lots of BOs, we probably want it to take a list of ranges so we
  aren't making one ioctl for each thing we want to bind.


Hi Jason,

Yah, some of these requirements came up.
They are not being done here due to time and effort involved in defining
those requirements, implementing and validating.

However, this ioctl can be extended in a backward compatible way to handle
those requirements if required.


   3. Why are there no ways to synchronize this with anything?  For binding,
  this probably isn't really needed as long as the VA range you're binding
  is empty.  However, if you want to move bindings around or unbind
  something, the only option is to block in userspace and then call
  bind/unbind.  This can be done but it means even more threads in the UMD
  which is unpleasant.  One could argue that that's more or less what the
  kernel is going to have to do so we may as well do it in userspace. 
  However, I'm not 100% convinced that's true.

  --Jason



Yah, that is the thought.
But as SVM feature evolves, I think we can consider handling some such cases
if hadling those in driver does make whole lot sense. 


Thanks,
Niranjana



+ */
+struct drm_i915_gem_vm_bind {
+   /** VA start to bind **/
+   __u64 start;
+
+   /** Type of memory to [un]bind **/
+   __u32 type;
+#define I915_GEM_VM_BIND_SVM_OBJ  0
+
+   /** Object handle to [un]bind for I915_GEM_VM_BIND_SVM_OBJ type
**/
+   __u32 handle;
+
+   /** vm to [un]bind **/
+   __u32 vm_id;
+
+   /** Flags **/
+   __u32 flags;
+#define I915_GEM_VM_BIND_UNBIND  (1 << 0)
+#define I915_GEM_VM_BIND_READONLY(1 << 1)
+};
+
 #if defined(__cplusplus)
 }
 #endif
--
2.21.0.rc0.32.g243a4c7e27

___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [RFC v2 02/12] drm/i915/svm: Runtime (RT) allocator support

2019-12-13 Thread Jason Ekstrand
On Fri, Dec 13, 2019 at 4:07 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> Shared Virtual Memory (SVM) runtime allocator support allows
> binding a shared virtual address to a buffer object (BO) in the
> device page table through an ioctl call.
>
> Cc: Joonas Lahtinen 
> Cc: Jon Bloomfield 
> Cc: Daniel Vetter 
> Cc: Sudeep Dutt 
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathap...@intel.com>
> ---
>  drivers/gpu/drm/i915/Kconfig  | 11 
>  drivers/gpu/drm/i915/Makefile |  3 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 58 ++
>  drivers/gpu/drm/i915/gem/i915_gem_svm.c   | 60 +++
>  drivers/gpu/drm/i915/gem/i915_gem_svm.h   | 22 +++
>  drivers/gpu/drm/i915/i915_drv.c   | 21 +++
>  drivers/gpu/drm/i915/i915_drv.h   | 22 +++
>  drivers/gpu/drm/i915/i915_gem_gtt.c   |  1 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h   | 13 
>  include/uapi/drm/i915_drm.h   | 27 +
>  10 files changed, 227 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.c
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_svm.h
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index ba9595960bbe..c2e48710eec8 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -137,6 +137,16 @@ config DRM_I915_GVT_KVMGT
>   Choose this option if you want to enable KVMGT support for
>   Intel GVT-g.
>
> +config DRM_I915_SVM
> +   bool "Enable Shared Virtual Memory support in i915"
> +   depends on STAGING
> +   depends on DRM_I915
> +   default n
> +   help
> + Choose this option if you want Shared Virtual Memory (SVM)
> + support in i915. With SVM support, one can share the virtual
> + address space between a process and the GPU.
> +
>  menu "drm/i915 Debugging"
>  depends on DRM_I915
>  depends on EXPERT
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index e0fd10c0cfb8..75fe45633779 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -153,6 +153,9 @@ i915-y += \
>   intel_region_lmem.o \
>   intel_wopcm.o
>
> +# SVM code
> +i915-$(CONFIG_DRM_I915_SVM) += gem/i915_gem_svm.o
> +
>  # general-purpose microcontroller (GuC) support
>  obj-y += gt/uc/
>  i915-y += gt/uc/intel_uc.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5003e616a1ad..af360238a392 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2836,10 +2836,14 @@ int
>  i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>struct drm_file *file)
>  {
> +   struct drm_i915_gem_exec_object2 *exec2_list, *exec2_list_user;
> struct drm_i915_gem_execbuffer2 *args = data;
> -   struct drm_i915_gem_exec_object2 *exec2_list;
> -   struct drm_syncobj **fences = NULL;
> const size_t count = args->buffer_count;
> +   struct drm_syncobj **fences = NULL;
> +   unsigned int i = 0, svm_count = 0;
> +   struct i915_address_space *vm;
> +   struct i915_gem_context *ctx;
> +   struct i915_svm_obj *svm_obj;
> int err;
>
> if (!check_buffer_count(count)) {
> @@ -2851,15 +2855,46 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev,
> void *data,
> if (err)
> return err;
>
> +   ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
> +   if (!ctx || !rcu_access_pointer(ctx->vm))
> +   return -ENOENT;
> +
> +   rcu_read_lock();
> +   vm = i915_vm_get(ctx->vm);
> +   rcu_read_unlock();
> +
> +alloc_again:
> +   svm_count = vm->svm_count;
> /* Allocate an extra slot for use by the command parser */
> -   exec2_list = kvmalloc_array(count + 1, eb_element_size(),
> +   exec2_list = kvmalloc_array(count + svm_count + 1,
> eb_element_size(),
> __GFP_NOWARN | GFP_KERNEL);
> if (exec2_list == NULL) {
> DRM_DEBUG("Failed to allocate exec list for %zd buffers\n",
> - count);
> + count + svm_count);
> return -ENOMEM;
> }
> -   if (copy_from_user(exec2_list,
> +   mutex_lock(&vm->mutex);
> +   if (svm_count != vm->svm_count) {
> +   mutex_unlock(&vm->mutex);
> +   kvfree(exec2_list);
> +   goto alloc_again;
> +   }
> +
> +   list_for_each_entry(svm_obj, &vm->svm_list, link) {
> +   memset(&exec2_list[i], 0, sizeof(*exec2_list));
> +   exec2_list[i].handle = svm_obj->handle;
> +   exec2_list[i].offset = svm_obj->offset;
> +