Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-28 Thread Zeng, Oak


Thanks,
Oak

> -Original Message-
> From: Tvrtko Ursulin 
> Sent: June 28, 2022 4:58 AM
> To: Zeng, Oak ; Landwerlin, Lionel G
> ; Vishwanathapura, Niranjana
> 
> Cc: Zanoni, Paulo R ; intel-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Hellstrom,
> Thomas ; Wilson, Chris P
> ; Vetter, Daniel ;
> christian.koe...@amd.com; Auld, Matthew 
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 27/06/2022 19:58, Zeng, Oak wrote:
> >
> >
> > Thanks,
> > Oak
> >
> >> -Original Message-
> >> From: Tvrtko Ursulin 
> >> Sent: June 27, 2022 4:30 AM
> >> To: Zeng, Oak ; Landwerlin, Lionel G
> >> ; Vishwanathapura, Niranjana
> >> 
> >> Cc: Zanoni, Paulo R ; intel-
> >> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
> >> Hellstrom, Thomas ; Wilson, Chris P
> >> ; Vetter, Daniel ;
> >> christian.koe...@amd.com; Auld, Matthew 
> >> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
> >> definition
> >>
> >>
> >> On 24/06/2022 21:23, Zeng, Oak wrote:
> >>> Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at
> >>> backing
> >> storage":
> >>>
> >>> Correctness:
> >>> consider this sequence of:
> >>> 1. unbind va1 from pa1,
> >>> 2. then bind va1 to pa2. //user space has the freedom to do this as
> >>> it manages virtual address space 3. Submit shader code using va1, 4.
> >>> Then retire pa1.
> >>>
> >>> If you don't perform tlb invalidate at step #1, in step #3, shader
> >>> will use
> >> stale entries in tlb and pa1 will be used for the shader. User want to use
> pa2.
> >> So I don't think invalidate tlb at step #4 make correctness.
> >>
> >> Define step 3. Is it a new execbuf? If so then there will be a TLB flush
> there.
> >> Unless the plan is to stop doing that with eb3 but I haven't picked
> >> up on that anywhere so far.
> >
> > In Niranjana's latest patch series, he removed the TLB flushing from
> vm_unbind. He also said explicitly TLB invalidation will be performed at job
> submission and backing storage releasing time, which is the existing behavior
> of the current i915 driver.
> >
> > I think if we invalidate TLB on each vm_unbind, then we don't need to
> invalidate at submission and backing storage releasing. It doesn't make a lot
> of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a
> behavior for the old implicit binding programming model. For vm_bind and
> eb3, we separate the binding and job submission into two APIs. It is more
> natural the TLB invalidation be coupled with the vm bind/unbind, not job
> submission. So in my opinion we should remove tlb invalidation from
> submission and backing storage releasing and add it to vm unbind. This is
> method is cleaner to me.
> 
> You can propose this model (not flushing in eb3) but I have my doubts.
> Consider the pointlessness of flushing on N unbinds for 99% of clients which
> are not infinite compute batch. And consider how you make the behaviour
> consistent on all platforms (selective vs global tlb flush).

When I thought about eb3, compute workload and ulls were also in the picture. 
Under ulls, user mode keep submitting job without calling execbuf (it uses a 
semaphore to notify HW of the new batch). The execbuf + backing release flush 
has a correctness issue as I pointed out. Now we decided eb3 is only for mesa, 
not for compute, we don't have this correctness problem for now. We can close 
this conversation for now and revive it when we move to Xe and vm bind for 
compute.

Regards,
Oak


> 
> Also note that this discussion is orthogonal to unbind vs backing store 
> release.
> 
> > Regarding performance, we don't have data. In my opinion, we should
> make things work in a most straight forward way as the first step. Then
> consider performance improvement if necessary. Consider some delayed tlb
> invalidation at submission and backing release time without performance
> data support wasn't a good decision.
> 
> It is quite straightforward though. ;) It aligns with the eb2 model and
> argument can be made backing store release is (much) less frequent than
> unbind (consider softpin where client could trigger a lot of pointless 
> flushes).
> 
> Regards,
> 
> Tvrtko


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-28 Thread Tvrtko Ursulin



On 27/06/2022 19:58, Zeng, Oak wrote:



Thanks,
Oak


-Original Message-
From: Tvrtko Ursulin 
Sent: June 27, 2022 4:30 AM
To: Zeng, Oak ; Landwerlin, Lionel G
; Vishwanathapura, Niranjana

Cc: Zanoni, Paulo R ; intel-
g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Hellstrom,
Thomas ; Wilson, Chris P
; Vetter, Daniel ;
christian.koe...@amd.com; Auld, Matthew 
Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition


On 24/06/2022 21:23, Zeng, Oak wrote:

Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing

storage":


Correctness:
consider this sequence of:
1. unbind va1 from pa1,
2. then bind va1 to pa2. //user space has the freedom to do this as it
manages virtual address space 3. Submit shader code using va1, 4. Then
retire pa1.

If you don't perform tlb invalidate at step #1, in step #3, shader will use

stale entries in tlb and pa1 will be used for the shader. User want to use pa2.
So I don't think invalidate tlb at step #4 make correctness.

Define step 3. Is it a new execbuf? If so then there will be a TLB flush there.
Unless the plan is to stop doing that with eb3 but I haven't picked up on that
anywhere so far.


In Niranjana's latest patch series, he removed the TLB flushing from vm_unbind. 
He also said explicitly TLB invalidation will be performed at job submission 
and backing storage releasing time, which is the existing behavior of the 
current i915 driver.

I think if we invalidate TLB on each vm_unbind, then we don't need to 
invalidate at submission and backing storage releasing. It doesn't make a lot 
of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a 
behavior for the old implicit binding programming model. For vm_bind and eb3, 
we separate the binding and job submission into two APIs. It is more natural 
the TLB invalidation be coupled with the vm bind/unbind, not job submission. So 
in my opinion we should remove tlb invalidation from submission and backing 
storage releasing and add it to vm unbind. This is method is cleaner to me.


You can propose this model (not flushing in eb3) but I have my doubts. 
Consider the pointlessness of flushing on N unbinds for 99% of clients 
which are not infinite compute batch. And consider how you make the 
behaviour consistent on all platforms (selective vs global tlb flush).


Also note that this discussion is orthogonal to unbind vs backing store 
release.



Regarding performance, we don't have data. In my opinion, we should make things 
work in a most straight forward way as the first step. Then consider 
performance improvement if necessary. Consider some delayed tlb invalidation at 
submission and backing release time without performance data support wasn't a 
good decision.


It is quite straightforward though. ;) It aligns with the eb2 model and 
argument can be made backing store release is (much) less frequent than 
unbind (consider softpin where client could trigger a lot of pointless 
flushes).


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-27 Thread Zeng, Oak


Thanks,
Oak

> -Original Message-
> From: Tvrtko Ursulin 
> Sent: June 27, 2022 4:30 AM
> To: Zeng, Oak ; Landwerlin, Lionel G
> ; Vishwanathapura, Niranjana
> 
> Cc: Zanoni, Paulo R ; intel-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Hellstrom,
> Thomas ; Wilson, Chris P
> ; Vetter, Daniel ;
> christian.koe...@amd.com; Auld, Matthew 
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 24/06/2022 21:23, Zeng, Oak wrote:
> > Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing
> storage":
> >
> > Correctness:
> > consider this sequence of:
> > 1. unbind va1 from pa1,
> > 2. then bind va1 to pa2. //user space has the freedom to do this as it
> > manages virtual address space 3. Submit shader code using va1, 4. Then
> > retire pa1.
> >
> > If you don't perform tlb invalidate at step #1, in step #3, shader will use
> stale entries in tlb and pa1 will be used for the shader. User want to use 
> pa2.
> So I don't think invalidate tlb at step #4 make correctness.
> 
> Define step 3. Is it a new execbuf? If so then there will be a TLB flush 
> there.
> Unless the plan is to stop doing that with eb3 but I haven't picked up on that
> anywhere so far.

In Niranjana's latest patch series, he removed the TLB flushing from vm_unbind. 
He also said explicitly TLB invalidation will be performed at job submission 
and backing storage releasing time, which is the existing behavior of the 
current i915 driver.

I think if we invalidate TLB on each vm_unbind, then we don't need to 
invalidate at submission and backing storage releasing. It doesn't make a lot 
of sense to me to perform a tlb invalidation at execbuf time. Maybe it is a 
behavior for the old implicit binding programming model. For vm_bind and eb3, 
we separate the binding and job submission into two APIs. It is more natural 
the TLB invalidation be coupled with the vm bind/unbind, not job submission. So 
in my opinion we should remove tlb invalidation from submission and backing 
storage releasing and add it to vm unbind. This is method is cleaner to me.

Regarding performance, we don't have data. In my opinion, we should make things 
work in a most straight forward way as the first step. Then consider 
performance improvement if necessary. Consider some delayed tlb invalidation at 
submission and backing release time without performance data support wasn't a 
good decision.

Regards,
Oak

> 
> > Performance:
> > It is straight forward to invalidate tlb at step 1. If platform support 
> > range
> based tlb invalidation, we can perform range based invalidation easily at
> step1.
> 
> If the platform supports range base yes. If it doesn't _and_ the flush at
> unbind is not needed for 99% of use cases then it is simply a waste.
> 
> > If you do it at step 4, you either need to perform a whole gt tlb 
> > invalidation
> (worse performance), or you need to record all the VAs that this pa has been
> bound to and invalidate all the VA ranges - ugly program.
> 
> Someone can setup some benchmarking? :)
> 
> Regards,
> 
> Tvrtko
> 
> >
> >
> > Thanks,
> > Oak
> >
> >> -Original Message-
> >> From: Tvrtko Ursulin 
> >> Sent: June 24, 2022 4:32 AM
> >> To: Zeng, Oak ; Landwerlin, Lionel G
> >> ; Vishwanathapura, Niranjana
> >> 
> >> Cc: Zanoni, Paulo R ; intel-
> >> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
> >> Hellstrom, Thomas ; Wilson, Chris P
> >> ; Vetter, Daniel ;
> >> christian.koe...@amd.com; Auld, Matthew 
> >> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
> >> definition
> >>
> >>
> >> On 23/06/2022 22:05, Zeng, Oak wrote:
> >>>> -Original Message-
> >>>> From: Intel-gfx  On Behalf
> >>>> Of Tvrtko Ursulin
> >>>> Sent: June 23, 2022 7:06 AM
> >>>> To: Landwerlin, Lionel G ;
> >>>> Vishwanathapura, Niranjana 
> >>>> Cc: Zanoni, Paulo R ;
> >>>> intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
> >>>> Hellstrom, Thomas ; Wilson, Chris P
> >>>> ; Vetter, Daniel
> >>>> ; christian.koe...@amd.com; Auld,
> Matthew
> >>>> 
> >>>> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
> >>>> definition
> >>>>
> >>>>
> >>>> On 23/06/2022 09:57, Lionel Landwerlin wrote:
> >>>>> On 23/06/2022 11:27, Tvrtko Ursulin wrote:
> &g

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-27 Thread Tvrtko Ursulin



On 24/06/2022 21:23, Zeng, Oak wrote:

Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing 
storage":

Correctness:
consider this sequence of:
1. unbind va1 from pa1,
2. then bind va1 to pa2. //user space has the freedom to do this as it manages 
virtual address space
3. Submit shader code using va1,
4. Then retire pa1.

If you don't perform tlb invalidate at step #1, in step #3, shader will use 
stale entries in tlb and pa1 will be used for the shader. User want to use pa2. 
So I don't think invalidate tlb at step #4 make correctness.


Define step 3. Is it a new execbuf? If so then there will be a TLB flush 
there. Unless the plan is to stop doing that with eb3 but I haven't 
picked up on that anywhere so far.



Performance:
It is straight forward to invalidate tlb at step 1. If platform support range 
based tlb invalidation, we can perform range based invalidation easily at step1.


If the platform supports range base yes. If it doesn't _and_ the flush 
at unbind is not needed for 99% of use cases then it is simply a waste.



If you do it at step 4, you either need to perform a whole gt tlb invalidation 
(worse performance), or you need to record all the VAs that this pa has been 
bound to and invalidate all the VA ranges - ugly program.


Someone can setup some benchmarking? :)

Regards,

Tvrtko




Thanks,
Oak


-Original Message-
From: Tvrtko Ursulin 
Sent: June 24, 2022 4:32 AM
To: Zeng, Oak ; Landwerlin, Lionel G
; Vishwanathapura, Niranjana

Cc: Zanoni, Paulo R ; intel-
g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Hellstrom,
Thomas ; Wilson, Chris P
; Vetter, Daniel ;
christian.koe...@amd.com; Auld, Matthew 
Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition


On 23/06/2022 22:05, Zeng, Oak wrote:

-Original Message-
From: Intel-gfx  On Behalf
Of Tvrtko Ursulin
Sent: June 23, 2022 7:06 AM
To: Landwerlin, Lionel G ;
Vishwanathapura, Niranjana 
Cc: Zanoni, Paulo R ;
intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
Hellstrom, Thomas ; Wilson, Chris P
; Vetter, Daniel ;
christian.koe...@amd.com; Auld, Matthew 
Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
definition


On 23/06/2022 09:57, Lionel Landwerlin wrote:

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an
active VM.
Though I am not sue with Mesa usecase if that new mapping is
required for running GPU job or it will be for the next
submission. But ensuring the tlb flush upon unbind, KMD can ensure
correctness.


Isn't that their problem? If they re-bind for submitting _new_ work
then they get the flush as part of batch buffer pre-amble.


In the non sparse case, if a VA range is unbound, it is invalid to
use that range for anything until it has been rebound by something else.

We'll take the fence provided by vm_bind and put it as a wait fence
on the next execbuffer.

It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


Currently TLB flush happens from the ring before every BB_START and
also when i915 returns the backing store pages to the system.



Can you explain more why tlb flush when i915 retire the backing storage? I

never figured that out when I looked at the codes. As I understand it, tlb
caches the gpu page tables which map a va to a pa. So it is straight forward to
me that we perform a tlb flush when we change the page table (either at vm
bind time or unbind time. Better at unbind time for performance reason).

I don't know what performs better - someone can measure the two
approaches? Certainly on platforms where we only have global TLB flushing
the cost is quite high so my thinking was to allow i915 to control when it will
be done and not guarantee it in the uapi if it isn't needed for security 
reasons.


But it is rather tricky to me to flush tlb when we retire a backing storage. I

don't see how backing storage can be connected to page table. Let's say user
unbind va1 from pa1, then bind va1 to pa2. Then retire pa1. Submit shader
code using va1. If we don't tlb flush after unbind va1, the new shader code
which is supposed to use pa2 will still use pa1 due to the stale entries in tlb,
right? The point is, tlb cached is tagged with virtual address, not physical
address. so after we unbind va1 from pa1, regardless we retire pa1 or not,
va1 can be bound to another pa2.

When you say "retire pa1" I will assume you meant release backing storage
for pa1. At this point i915 currently does do the TLB flush and that ensures no
PTE can point to pa1.

This approach deals with security of the system as a whole. Client may still
cause rendering corruption or a GPU hang for itself but that should be
completely isolated. (This is the part where you say "regardless if we retire
pa1 or not" I think.)

But I think those are 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-24 Thread Zeng, Oak
Let's compare "tlb invalidate at vm unbind" vs "tlb invalidate at backing 
storage":

Correctness: 
consider this sequence of:
1. unbind va1 from pa1, 
2. then bind va1 to pa2. //user space has the freedom to do this as it manages 
virtual address space
3. Submit shader code using va1, 
4. Then retire pa1. 

If you don't perform tlb invalidate at step #1, in step #3, shader will use 
stale entries in tlb and pa1 will be used for the shader. User want to use pa2. 
So I don't think invalidate tlb at step #4 make correctness.


Performance: 
It is straight forward to invalidate tlb at step 1. If platform support range 
based tlb invalidation, we can perform range based invalidation easily at step1.
If you do it at step 4, you either need to perform a whole gt tlb invalidation 
(worse performance), or you need to record all the VAs that this pa has been 
bound to and invalidate all the VA ranges - ugly program.


Thanks,
Oak

> -Original Message-
> From: Tvrtko Ursulin 
> Sent: June 24, 2022 4:32 AM
> To: Zeng, Oak ; Landwerlin, Lionel G
> ; Vishwanathapura, Niranjana
> 
> Cc: Zanoni, Paulo R ; intel-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Hellstrom,
> Thomas ; Wilson, Chris P
> ; Vetter, Daniel ;
> christian.koe...@amd.com; Auld, Matthew 
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 23/06/2022 22:05, Zeng, Oak wrote:
> >> -Original Message-
> >> From: Intel-gfx  On Behalf
> >> Of Tvrtko Ursulin
> >> Sent: June 23, 2022 7:06 AM
> >> To: Landwerlin, Lionel G ;
> >> Vishwanathapura, Niranjana 
> >> Cc: Zanoni, Paulo R ;
> >> intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
> >> Hellstrom, Thomas ; Wilson, Chris P
> >> ; Vetter, Daniel ;
> >> christian.koe...@amd.com; Auld, Matthew 
> >> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi
> >> definition
> >>
> >>
> >> On 23/06/2022 09:57, Lionel Landwerlin wrote:
> >>> On 23/06/2022 11:27, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> After a vm_unbind, UMD can re-bind to same VA range against an
> >>>>> active VM.
> >>>>> Though I am not sue with Mesa usecase if that new mapping is
> >>>>> required for running GPU job or it will be for the next
> >>>>> submission. But ensuring the tlb flush upon unbind, KMD can ensure
> >>>>> correctness.
> >>>>
> >>>> Isn't that their problem? If they re-bind for submitting _new_ work
> >>>> then they get the flush as part of batch buffer pre-amble.
> >>>
> >>> In the non sparse case, if a VA range is unbound, it is invalid to
> >>> use that range for anything until it has been rebound by something else.
> >>>
> >>> We'll take the fence provided by vm_bind and put it as a wait fence
> >>> on the next execbuffer.
> >>>
> >>> It might be safer in case of memory over fetching?
> >>>
> >>>
> >>> TLB flush will have to happen at some point right?
> >>>
> >>> What's the alternative to do it in unbind?
> >>
> >> Currently TLB flush happens from the ring before every BB_START and
> >> also when i915 returns the backing store pages to the system.
> >
> >
> > Can you explain more why tlb flush when i915 retire the backing storage? I
> never figured that out when I looked at the codes. As I understand it, tlb
> caches the gpu page tables which map a va to a pa. So it is straight forward 
> to
> me that we perform a tlb flush when we change the page table (either at vm
> bind time or unbind time. Better at unbind time for performance reason).
> 
> I don't know what performs better - someone can measure the two
> approaches? Certainly on platforms where we only have global TLB flushing
> the cost is quite high so my thinking was to allow i915 to control when it 
> will
> be done and not guarantee it in the uapi if it isn't needed for security 
> reasons.
> 
> > But it is rather tricky to me to flush tlb when we retire a backing 
> > storage. I
> don't see how backing storage can be connected to page table. Let's say user
> unbind va1 from pa1, then bind va1 to pa2. Then retire pa1. Submit shader
> code using va1. If we don't tlb flush after unbind va1, the new shader code
> which is supposed to use pa2 will still use pa1 due to the stale entries in 
> tlb,
> right? The point is, tlb cached is tagged with virtual address, not physical
> address. so after we unbind va1 from 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-24 Thread Tvrtko Ursulin



On 23/06/2022 22:05, Zeng, Oak wrote:

-Original Message-
From: Intel-gfx  On Behalf Of Tvrtko
Ursulin
Sent: June 23, 2022 7:06 AM
To: Landwerlin, Lionel G ; Vishwanathapura,
Niranjana 
Cc: Zanoni, Paulo R ; intel-gfx@lists.freedesktop.org;
dri-de...@lists.freedesktop.org; Hellstrom, Thomas ;
Wilson, Chris P ; Vetter, Daniel
; christian.koe...@amd.com; Auld, Matthew

Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition


On 23/06/2022 09:57, Lionel Landwerlin wrote:

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an active
VM.
Though I am not sue with Mesa usecase if that new mapping is required
for
running GPU job or it will be for the next submission. But ensuring the
tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work
then they get the flush as part of batch buffer pre-amble.


In the non sparse case, if a VA range is unbound, it is invalid to use
that range for anything until it has been rebound by something else.

We'll take the fence provided by vm_bind and put it as a wait fence on
the next execbuffer.

It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


Currently TLB flush happens from the ring before every BB_START and also
when i915 returns the backing store pages to the system.



Can you explain more why tlb flush when i915 retire the backing storage? I 
never figured that out when I looked at the codes. As I understand it, tlb 
caches the gpu page tables which map a va to a pa. So it is straight forward to 
me that we perform a tlb flush when we change the page table (either at vm bind 
time or unbind time. Better at unbind time for performance reason).


I don't know what performs better - someone can measure the two 
approaches? Certainly on platforms where we only have global TLB 
flushing the cost is quite high so my thinking was to allow i915 to 
control when it will be done and not guarantee it in the uapi if it 
isn't needed for security reasons.



But it is rather tricky to me to flush tlb when we retire a backing storage. I 
don't see how backing storage can be connected to page table. Let's say user 
unbind va1 from pa1, then bind va1 to pa2. Then retire pa1. Submit shader code 
using va1. If we don't tlb flush after unbind va1, the new shader code which is 
supposed to use pa2 will still use pa1 due to the stale entries in tlb, right? 
The point is, tlb cached is tagged with virtual address, not physical address. 
so after we unbind va1 from pa1, regardless we retire pa1 or not, va1 can be 
bound to another pa2.


When you say "retire pa1" I will assume you meant release backing 
storage for pa1. At this point i915 currently does do the TLB flush and 
that ensures no PTE can point to pa1.


This approach deals with security of the system as a whole. Client may 
still cause rendering corruption or a GPU hang for itself but that 
should be completely isolated. (This is the part where you say 
"regardless if we retire pa1 or not" I think.)


But I think those are advanced use cases where userspace wants to 
manipulate PTEs while something is running on the GPU in parallel. AFAIK 
limited to compute "infinite batch" so my thinking is to avoid adding a 
performance penalty to the common case. Especially on platforms which 
only have global flush.


But.. to circle back on the measuring angle. Until someone invests time 
and effort to benchmark the two approaches (flush on unbind vs flush on 
backing store release) we don't really know. All I know is the perf hit 
with the current solution was significant, AFAIR up to teen digits on 
some games. And considering the flushes were driven only by the shrinker 
activity, my thinking was they would be less frequent than the unbinds, 
therefore have the potential for a smaller perf hit.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Zeng, Oak


Regards,
Oak

> -Original Message-
> From: Intel-gfx  On Behalf Of Tvrtko
> Ursulin
> Sent: June 23, 2022 7:06 AM
> To: Landwerlin, Lionel G ; Vishwanathapura,
> Niranjana 
> Cc: Zanoni, Paulo R ; 
> intel-gfx@lists.freedesktop.org;
> dri-de...@lists.freedesktop.org; Hellstrom, Thomas 
> ;
> Wilson, Chris P ; Vetter, Daniel
> ; christian.koe...@amd.com; Auld, Matthew
> 
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
> 
> 
> On 23/06/2022 09:57, Lionel Landwerlin wrote:
> > On 23/06/2022 11:27, Tvrtko Ursulin wrote:
> >>>
> >>> After a vm_unbind, UMD can re-bind to same VA range against an active
> >>> VM.
> >>> Though I am not sue with Mesa usecase if that new mapping is required
> >>> for
> >>> running GPU job or it will be for the next submission. But ensuring the
> >>> tlb flush upon unbind, KMD can ensure correctness.
> >>
> >> Isn't that their problem? If they re-bind for submitting _new_ work
> >> then they get the flush as part of batch buffer pre-amble.
> >
> > In the non sparse case, if a VA range is unbound, it is invalid to use
> > that range for anything until it has been rebound by something else.
> >
> > We'll take the fence provided by vm_bind and put it as a wait fence on
> > the next execbuffer.
> >
> > It might be safer in case of memory over fetching?
> >
> >
> > TLB flush will have to happen at some point right?
> >
> > What's the alternative to do it in unbind?
> 
> Currently TLB flush happens from the ring before every BB_START and also
> when i915 returns the backing store pages to the system.


Can you explain more why tlb flush when i915 retire the backing storage? I 
never figured that out when I looked at the codes. As I understand it, tlb 
caches the gpu page tables which map a va to a pa. So it is straight forward to 
me that we perform a tlb flush when we change the page table (either at vm bind 
time or unbind time. Better at unbind time for performance reason).

But it is rather tricky to me to flush tlb when we retire a backing storage. I 
don't see how backing storage can be connected to page table. Let's say user 
unbind va1 from pa1, then bind va1 to pa2. Then retire pa1. Submit shader code 
using va1. If we don't tlb flush after unbind va1, the new shader code which is 
supposed to use pa2 will still use pa1 due to the stale entries in tlb, right? 
The point is, tlb cached is tagged with virtual address, not physical address. 
so after we unbind va1 from pa1, regardless we retire pa1 or not, va1 can be 
bound to another pa2.

Thanks,
Oak 


> 
> For the former, I haven't seen any mention that for execbuf3 there are
> plans to stop doing it? Anyway, as long as this is kept and sequence of
> bind[1..N]+execbuf is safe and correctly sees all the preceding binds.
> Hence about the alternative to doing it in unbind - first I think lets
> state the problem that is trying to solve.
> 
> For instance is it just for the compute "append work to the running
> batch" use case? I honestly don't remember how was that supposed to work
> so maybe the tlb flush on bind was supposed to deal with that scenario?
> 
> Or you see a problem even for Mesa with the current model?
> 
> Regards,
> 
> Tvrtko


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Niranjana Vishwanathapura

On Thu, Jun 23, 2022 at 09:27:22AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 17:44, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 04:57:17PM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 
+++

 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode 
of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will 
not accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND 
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND 
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind 
completion notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a 
drm_syncobj into a

+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies 
the mapping of GPU
+ * virtual address (VA) range to the section of an object 
that should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not 
currently bound) and can
+ * be mapped to whole object or a section of the object 
(partial binding).
+ * Multiple VA mappings can be created to the same section 
of the object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page 
aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory 
and has compact page
+ * table. On those platforms, for binding device 
local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should 
be 64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been 
considered?




Currently what we have internally is that -EINVAL is returned if 
the sart, offset
and length are not aligned. If the specified mapping already 
exits, we return
-EEXIST. If there are conflicts in the VA range and VA range 
can't be reserved,
then -ENOSPC is returned. I can add this documentation here. But 
I am worried
that there will be more suggestions/feedback about error codes 
while reviewing

the code patch series, and we have to revisit it again.


I'd still suggest documenting those three. It makes sense to 
explain to userspace what behaviour they will see if they get it 
wrong.




Ok.

+ * Also, on those platforms, it is not allowed to bind an 
device local-memory
+ * object and a system memory object in a single 2M section 
of VA range.


Text should be clear whether "not allowed" means there will be 
an error returned, or it will appear to work but bad things 
will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Niranjana Vishwanathapura

On Thu, Jun 23, 2022 at 12:28:32PM +0300, Lionel Landwerlin wrote:

On 22/06/2022 18:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of 
operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE 
+ DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a 
drm_syncobj into a

+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently 
bound) and can
+ * be mapped to whole object or a section of the object 
(partial binding).
+ * Multiple VA mappings can be created to the same section of 
the object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and 
has compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 
64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, 
we return
-EEXIST. If there are conflicts in the VA range and VA range can't 
be reserved,
then -ENOSPC is returned. I can add this documentation here. But I 
am worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.



That's not really a good excuse to not document.



Yah, I have documented it in the v4 series I sent out.





+ * Also, on those platforms, it is not allowed to bind an 
device local-memory
+ * object and a system memory object in a single 2M section of 
VA range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will 
happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+    /** @length: Length of mapping to bind */
+    __u64 length;
+
+    /**
+ * 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 23/06/2022 14:05, Tvrtko Ursulin wrote:


On 23/06/2022 09:57, Lionel Landwerlin wrote:

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an 
active VM.
Though I am not sue with Mesa usecase if that new mapping is 
required for
running GPU job or it will be for the next submission. But ensuring 
the

tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work 
then they get the flush as part of batch buffer pre-amble. 


In the non sparse case, if a VA range is unbound, it is invalid to 
use that range for anything until it has been rebound by something else.


We'll take the fence provided by vm_bind and put it as a wait fence 
on the next execbuffer.


It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


Currently TLB flush happens from the ring before every BB_START and 
also when i915 returns the backing store pages to the system.


For the former, I haven't seen any mention that for execbuf3 there are 
plans to stop doing it? Anyway, as long as this is kept and sequence 
of bind[1..N]+execbuf is safe and correctly sees all the preceding binds.
Hence about the alternative to doing it in unbind - first I think lets 
state the problem that is trying to solve.


For instance is it just for the compute "append work to the running 
batch" use case? I honestly don't remember how was that supposed to 
work so maybe the tlb flush on bind was supposed to deal with that 
scenario?


Or you see a problem even for Mesa with the current model?

Regards,

Tvrtko



As far as I can tell, all the binds should have completed before execbuf 
starts if you follow the vulkan sparse binding rules.


For non-sparse, the UMD will take care of it.

I think we're fine.


-Lionel




Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Tvrtko Ursulin



On 23/06/2022 09:57, Lionel Landwerlin wrote:

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an active 
VM.
Though I am not sue with Mesa usecase if that new mapping is required 
for

running GPU job or it will be for the next submission. But ensuring the
tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work 
then they get the flush as part of batch buffer pre-amble. 


In the non sparse case, if a VA range is unbound, it is invalid to use 
that range for anything until it has been rebound by something else.


We'll take the fence provided by vm_bind and put it as a wait fence on 
the next execbuffer.


It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


Currently TLB flush happens from the ring before every BB_START and also 
when i915 returns the backing store pages to the system.


For the former, I haven't seen any mention that for execbuf3 there are 
plans to stop doing it? Anyway, as long as this is kept and sequence of 
bind[1..N]+execbuf is safe and correctly sees all the preceding binds.
Hence about the alternative to doing it in unbind - first I think lets 
state the problem that is trying to solve.


For instance is it just for the compute "append work to the running 
batch" use case? I honestly don't remember how was that supposed to work 
so maybe the tlb flush on bind was supposed to deal with that scenario?


Or you see a problem even for Mesa with the current model?

Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 22/06/2022 18:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of 
operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj 
into a

+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) 
and can
+ * be mapped to whole object or a section of the object (partial 
binding).
+ * Multiple VA mappings can be created to the same section of the 
object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has 
compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K 
aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, we 
return
-EEXIST. If there are conflicts in the VA range and VA range can't be 
reserved,
then -ENOSPC is returned. I can add this documentation here. But I am 
worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.



That's not really a good excuse to not document.




+ * Also, on those platforms, it is not allowed to bind an device 
local-memory
+ * object and a system memory object in a single 2M section of VA 
range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+    /** @length: Length of mapping to bind */
+    __u64 length;
+
+    /**
+ * @flags: Supported flags are:
+ *
+ * I915_GEM_VM_BIND_FENCE_VALID:
+ * @fence is valid, needs bind completion 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an active 
VM.
Though I am not sue with Mesa usecase if that new mapping is required 
for

running GPU job or it will be for the next submission. But ensuring the
tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work 
then they get the flush as part of batch buffer pre-amble. 


In the non sparse case, if a VA range is unbound, it is invalid to use 
that range for anything until it has been rebound by something else.


We'll take the fence provided by vm_bind and put it as a wait fence on 
the next execbuffer.


It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


-Lionel



Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Tvrtko Ursulin



On 22/06/2022 17:44, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 04:57:17PM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 
+++

 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of 
operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj 
into a

+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently 
bound) and can
+ * be mapped to whole object or a section of the object (partial 
binding).
+ * Multiple VA mappings can be created to the same section of the 
object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has 
compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K 
aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, 
we return
-EEXIST. If there are conflicts in the VA range and VA range can't be 
reserved,
then -ENOSPC is returned. I can add this documentation here. But I am 
worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.


I'd still suggest documenting those three. It makes sense to explain 
to userspace what behaviour they will see if they get it wrong.




Ok.

+ * Also, on those platforms, it is not allowed to bind an device 
local-memory
+ * object and a system memory object in a single 2M section of VA 
range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+  

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-22 Thread Niranjana Vishwanathapura

On Wed, Jun 22, 2022 at 09:44:47AM -0700, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 04:57:17PM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will 
not accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND    
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies 
the mapping of GPU
+ * virtual address (VA) range to the section of an object 
that should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently 
bound) and can
+ * be mapped to whole object or a section of the object 
(partial binding).
+ * Multiple VA mappings can be created to the same section of 
the object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and 
has compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 
64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been 
considered?




Currently what we have internally is that -EINVAL is returned if 
the sart, offset
and length are not aligned. If the specified mapping already 
exits, we return
-EEXIST. If there are conflicts in the VA range and VA range can't 
be reserved,
then -ENOSPC is returned. I can add this documentation here. But I 
am worried
that there will be more suggestions/feedback about error codes 
while reviewing

the code patch series, and we have to revisit it again.


I'd still suggest documenting those three. It makes sense to explain 
to userspace what behaviour they will see if they get it wrong.




Ok.


I have posted v4 with the fixes. I have simplified the error code a
bit by removing EEXIST which is just a special case of ENOSPC.

Niranjana



+ * Also, on those platforms, it is not allowed to bind an 
device local-memory
+ * object and a system memory object in a single 2M section 
of VA range.


Text should be clear whether "not allowed" means there will be 
an error returned, or it will appear to work but bad things will 
happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-22 Thread Niranjana Vishwanathapura

On Wed, Jun 22, 2022 at 04:57:17PM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND    
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently 
bound) and can
+ * be mapped to whole object or a section of the object 
(partial binding).
+ * Multiple VA mappings can be created to the same section of 
the object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and 
has compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 
64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, 
we return
-EEXIST. If there are conflicts in the VA range and VA range can't 
be reserved,
then -ENOSPC is returned. I can add this documentation here. But I 
am worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.


I'd still suggest documenting those three. It makes sense to explain 
to userspace what behaviour they will see if they get it wrong.




Ok.

+ * Also, on those platforms, it is not allowed to bind an 
device local-memory
+ * object and a system memory object in a single 2M section of 
VA range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will 
happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+    /** @length: Length of mapping to 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-22 Thread Tvrtko Ursulin



On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND    DRM_IOWR(DRM_COMMAND_BASE 
+ DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3
DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) 
and can
+ * be mapped to whole object or a section of the object (partial 
binding).
+ * Multiple VA mappings can be created to the same section of the 
object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has 
compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K 
aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, we 
return
-EEXIST. If there are conflicts in the VA range and VA range can't be 
reserved,
then -ENOSPC is returned. I can add this documentation here. But I am 
worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.


I'd still suggest documenting those three. It makes sense to explain to 
userspace what behaviour they will see if they get it wrong.


+ * Also, on those platforms, it is not allowed to bind an device 
local-memory
+ * object and a system memory object in a single 2M section of VA 
range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+    /** @length: Length of mapping to bind */
+    __u64 length;
+
+    /**
+ * @flags: Supported flags 

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-22 Thread Niranjana Vishwanathapura

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
I915_GEM_VM_BIND/UNBIND_FENCE_VALID
and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 
---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND 57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND   0x3d
+#define DRM_I915_GEM_VM_UNBIND 0x3e
+#define DRM_I915_GEM_EXECBUFFER3   0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification.
+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+   /** @handle: User's handle for a drm_syncobj to signal. */
+   __u32 handle;
+
+   /** @rsvd: Reserved, MBZ */
+   __u32 rsvd;
+
+   /**
+* @value: A point in the timeline.
+* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+* binary one.
+*/
+   __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the sart, 
offset
and length are not aligned. If the specified mapping already exits, we return
-EEXIST. If there are conflicts in the VA range and VA range can't be reserved,
then -ENOSPC is returned. I can add this documentation here. But I am worried
that there will be more suggestions/feedback about error codes while reviewing
the code patch series, and we have to revisit it again.


+ * Also, on those platforms, it is not allowed to bind an device local-memory
+ * object and a system memory object in a single 2M section of VA range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+   /** @vm_id: VM (address space) id to bind */
+   __u32 vm_id;
+
+   /** @handle: Object handle */
+   __u32 handle;
+
+   /** @start: Virtual Address start to bind */
+   __u64 start;
+
+   /** @offset: Offset in object to bind */
+   __u64 offset;
+
+   /** @length: Length of mapping to bind */
+   __u64 length;
+
+   /**
+* @flags: Supported flags are:
+*
+* I915_GEM_VM_BIND_FENCE_VALID:
+* @fence is valid, needs bind completion notification.
+*
+   

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-22 Thread Tvrtko Ursulin



On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
 I915_GEM_VM_BIND/UNBIND_FENCE_VALID
 and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 
---
  Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
  1 file changed, 243 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND 57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND   0x3d
+#define DRM_I915_GEM_VM_UNBIND 0x3e
+#define DRM_I915_GEM_EXECBUFFER3   0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification.
+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+   /** @handle: User's handle for a drm_syncobj to signal. */
+   __u32 handle;
+
+   /** @rsvd: Reserved, MBZ */
+   __u32 rsvd;
+
+   /**
+* @value: A point in the timeline.
+* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
+* binary one.
+*/
+   __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?



+ * Also, on those platforms, it is not allowed to bind an device local-memory
+ * object and a system memory object in a single 2M section of VA range.


Text should be clear whether "not allowed" means there will be an error 
returned, or it will appear to work but bad things will happen.



+ */
+struct drm_i915_gem_vm_bind {
+   /** @vm_id: VM (address space) id to bind */
+   __u32 vm_id;
+
+   /** @handle: Object handle */
+   __u32 handle;
+
+   /** @start: Virtual Address start to bind */
+   __u64 start;
+
+   /** @offset: Offset in object to bind */
+   __u64 offset;
+
+   /** @length: Length of mapping to bind */
+   __u64 length;
+
+   /**
+* @flags: Supported flags are:
+*
+* I915_GEM_VM_BIND_FENCE_VALID:
+* @fence is valid, needs bind completion notification.
+*
+* I915_GEM_VM_BIND_READONLY:
+* Mapping is read-only.
+*
+* I915_GEM_VM_BIND_CAPTURE:
+* Capture this mapping in the dump upon GPU error.
+*
+* I915_GEM_VM_BIND_TLB_FLUSH:
+* Flush the TLB for the specified range after bind completion.
+*/
+   __u64 flags;
+#define I915_GEM_VM_BIND_FENCE_VALID   (1 << 0)
+#define I915_GEM_VM_BIND_READONLY  (1 << 1)
+#define I915_GEM_VM_BIND_CAPTURE   (1 << 2)
+#define I915_GEM_VM_BIND_TLB_FLUSH (1 << 2)


What is the use case