Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 10:25:15AM +0200, Christian König wrote:
> Am 31.07.19 um 10:05 schrieb Daniel Vetter:
> > [SNIP]
> > > > Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> > > > was.  The discussion helped clear up several bits of confusion on my 
> > > > part.
> > > >   From proposed names, I find MAPPED and PINNED slightly confusing.
> > > > In terms of backing store description, maybe these are a little better:
> > > > DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
> > > > DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)
> > > That's still not correct. Let me describe what each of the tree stands 
> > > for:
> > > 
> > > 1. The backing store is a shmem file so the individual pages are
> > > swapable by the core OS.
> > > 2. The backing store is allocate GPU accessible but not currently in use
> > > by the GPU.
> > > 3. The backing store is currently in use by the GPU.
> > > 
> > > For i915 all three of those are basically the same and you only need to
> > > worry about it much.
> > We do pretty much have these three states for i915 gem bo too. Of
> > course none have a reasonable upper limit since it's all shared
> > memory. The hard limit would be system memory + swap for 1, and only
> > system memory for 2 and 3.
> 
> Good to know.
> 
> > > But for other drivers that's certainly not true and we need this
> > > distinction of the backing store of an object.
> > > 
> > > I'm just not sure how we would handle that for cgroups. From experience
> > > we certainly want a limit over all 3, but you usually also want to limit
> > > 3 alone.
> > To avoid lolz against the shrinker I think you also want to limit 2+3.
> > Afaiui ttm does that with the global limit, to avoid driving the
> > system against the wall.
> 
> Yes, exactly. But I think you only need that when 2+3 are not backed by
> pinning shmem. E.g. for i915 I'm not sure you want this limitation.

Maybe I need to share how bad exactly the i915 driver is fighting its own
shrinker at the next conference, over some good drinks ... Just becaue we
use shmem directly doesn't make this easier really at all, we're still
pinning memory that the core mm can't evict anymore.

> > [SNIP]
> > > #1 and #2 in my example above should probably not be configured by the
> > > driver itself.
> > > 
> > > And yes seeing those as special for state handling sounds like the
> > > correct approach to me.
> > Do we have any hw that wants custom versions of 3?
> 
> I can't think of any. If a driver needs something special for 3 then that
> should be domain VRAM or domain PRIV.
> 
> As far as I can see with the proposed separation we can even handle AGP.
> 
> > The only hw designs
> > I know of either have one shared translation table (but only one per
> > device, so having just 1 domain is good enough). Or TT mappings are in
> > the per-process pagetables, and then you're defacto unlimited (and
> > again one domain is good enough). So roughly:
> > 
> > - 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
> > only account to one of them, never both).
> > - 3 would be a subgroup of 2, and per device. A bo in group 3 is also
> > always in group 2.
> 
> Yes, that sounds like a good description certainly like the right why to see
> it.
> 
> > For VRAM and VRAM-similar things (like stolen system memory, or if you
> > have VRAM that's somehow split up like with a dual gpu perhaps) I
> > agree the driver needs to register that. And we just have some
> > standard flags indicating that "this is kinda like VRAM".
> 
> Yeah, agree totally as well.

Cheers, Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > > > > > TTM was clearly missing that resulting in a whole bunch of extra
> > > > > > > > handling and rather complicated handling.
> > > > > > > > 
> > > > > > > > > +#define DRM_MEM_SYSTEM 0
> > > > > > > > > +#define DRM_MEM_STOLEN 1
> > > > > > > > I think we need a better naming for that.
> > > > > > > > 
> > > > > > > > STOLEN sounds way to much like stolen VRAM for integrated GPUs, 
> > > > > > > > but at
> > > > > > > > least for TTM this is the system memory currently GPU 
> > > > > > > > accessible.
> > > > > > > Yup this is wrong, for i915 we use this as stolen, for ttm it's 
> > > > > > > the gpu
> > > > > > > translation table window into system memory. Not the same thing 
> > > > > > > at all.
> > > > > > Thought so. The closest I have in mind is GTT, but everything else 
> > > > > > works
> > > > > > as well.
> > > > > Would your GPU_MAPPED above work for TT? I think we'll also need
> > > > > STOLEN, I'm even hearing noises that there's going to be stolen for
> > > > > discrete vram for us ... Also if we expand I guess we need to teach
> > > > > ttm to cope with more, or maybe treat the DRM one as some kind of
> > > > > sub-flavour.
> > > > Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
> > > > DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRA

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Christian König

Am 31.07.19 um 10:05 schrieb Daniel Vetter:

[SNIP]

Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
was.  The discussion helped clear up several bits of confusion on my part.
  From proposed names, I find MAPPED and PINNED slightly confusing.
In terms of backing store description, maybe these are a little better:
DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

That's still not correct. Let me describe what each of the tree stands for:

1. The backing store is a shmem file so the individual pages are
swapable by the core OS.
2. The backing store is allocate GPU accessible but not currently in use
by the GPU.
3. The backing store is currently in use by the GPU.

For i915 all three of those are basically the same and you only need to
worry about it much.

We do pretty much have these three states for i915 gem bo too. Of
course none have a reasonable upper limit since it's all shared
memory. The hard limit would be system memory + swap for 1, and only
system memory for 2 and 3.


Good to know.


But for other drivers that's certainly not true and we need this
distinction of the backing store of an object.

I'm just not sure how we would handle that for cgroups. From experience
we certainly want a limit over all 3, but you usually also want to limit
3 alone.

To avoid lolz against the shrinker I think you also want to limit 2+3.
Afaiui ttm does that with the global limit, to avoid driving the
system against the wall.


Yes, exactly. But I think you only need that when 2+3 are not backed by 
pinning shmem. E.g. for i915 I'm not sure you want this limitation.



[SNIP]

#1 and #2 in my example above should probably not be configured by the
driver itself.

And yes seeing those as special for state handling sounds like the
correct approach to me.

Do we have any hw that wants custom versions of 3?


I can't think of any. If a driver needs something special for 3 then 
that should be domain VRAM or domain PRIV.


As far as I can see with the proposed separation we can even handle AGP.


The only hw designs
I know of either have one shared translation table (but only one per
device, so having just 1 domain is good enough). Or TT mappings are in
the per-process pagetables, and then you're defacto unlimited (and
again one domain is good enough). So roughly:

- 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
only account to one of them, never both).
- 3 would be a subgroup of 2, and per device. A bo in group 3 is also
always in group 2.


Yes, that sounds like a good description certainly like the right why to 
see it.



For VRAM and VRAM-similar things (like stolen system memory, or if you
have VRAM that's somehow split up like with a dual gpu perhaps) I
agree the driver needs to register that. And we just have some
standard flags indicating that "this is kinda like VRAM".


Yeah, agree totally as well.

Christian.


-Daniel


Regards,
Christian.


TTM was clearly missing that resulting in a whole bunch of extra
handling and rather complicated handling.


+#define DRM_MEM_SYSTEM 0
+#define DRM_MEM_STOLEN 1

I think we need a better naming for that.

STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
least for TTM this is the system memory currently GPU accessible.

Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
translation table window into system memory. Not the same thing at all.

Thought so. The closest I have in mind is GTT, but everything else works
as well.

Would your GPU_MAPPED above work for TT? I think we'll also need
STOLEN, I'm even hearing noises that there's going to be stolen for
discrete vram for us ... Also if we expand I guess we need to teach
ttm to cope with more, or maybe treat the DRM one as some kind of
sub-flavour.

Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
I suggested above, I'm not sure.

-Brian



-Daniel


Christian.


-Daniel


Thanks for looking into that,
Christian.

Am 30.07.19 um 02:32 schrieb Brian Welty:

[ By request, resending to include amd-gfx + intel-gfx.  Since resending,
  I fixed the nit with ordering of header includes that Sam noted. ]

This RFC series is first implementation of some ideas expressed
earlier on dri-devel [1].

Some of the goals (open for much debate) are:
  - Create common base structure (subclass) for memory regions (patch #1)
  - Create common memory region types (patch #2)
  - Create common set of memory_region function callbacks (based on
ttm_mem_type_manager_funcs and intel_memory_regions_ops)
  - Create common helpers that operate on drm_mem_region to be leveraged
by both TTM drivers and i915, reducing code duplication
  - Above might start with refactoring ttm_bo_manager.c as these are
helpers for using drm_mm's range allocator and could be made to
   

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 8:54 AM Koenig, Christian
 wrote:
>
> Am 31.07.19 um 02:51 schrieb Brian Welty:
> [SNIP]
> >> +/*
> >> + * Memory types for drm_mem_region
> >> + */
> > #define DRM_MEM_SWAP?
>  btw what did you have in mind for this? Since we use shmem we kinda don't
>  know whether the BO is actually swapped out or not, at least on the i915
>  side. So this would be more 
>  NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
> >>> Yeah, the problem is not everybody can use shmem. For some use cases you
> >>> have to use memory allocated through dma_alloc_coherent().
> >>>
> >>> So to be able to swap this out you need a separate domain to copy it
> >>> from whatever is backing it currently to shmem.
> >>>
> >>> So we essentially have:
> >>> DRM_MEM_SYS_SWAPABLE
> >>> DRM_MEM_SYS_NOT_GPU_MAPPED
> >>> DRM_MEM_SYS_GPU_MAPPED
> >>>
> >>> Or something like that.
> >> Yeah i915-gem is similar. We oportunistically keep the pages pinned
> >> sometimes even if not currently mapped into the (what ttm calls) TT.
> >> So I think these three for system memory make sense for us too. I
> >> think that's similar (at least in spirit) to the dma_alloc cache you
> >> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
> >> we could have something like PINNED or so. Although it's not
> >> permanently pinned, so maybe that's confusing too.
> >>
> > Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> > was.  The discussion helped clear up several bits of confusion on my part.
> >  From proposed names, I find MAPPED and PINNED slightly confusing.
> > In terms of backing store description, maybe these are a little better:
> >DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
> >DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)
>
> That's still not correct. Let me describe what each of the tree stands for:
>
> 1. The backing store is a shmem file so the individual pages are
> swapable by the core OS.
> 2. The backing store is allocate GPU accessible but not currently in use
> by the GPU.
> 3. The backing store is currently in use by the GPU.
>
> For i915 all three of those are basically the same and you only need to
> worry about it much.

We do pretty much have these three states for i915 gem bo too. Of
course none have a reasonable upper limit since it's all shared
memory. The hard limit would be system memory + swap for 1, and only
system memory for 2 and 3.

> But for other drivers that's certainly not true and we need this
> distinction of the backing store of an object.
>
> I'm just not sure how we would handle that for cgroups. From experience
> we certainly want a limit over all 3, but you usually also want to limit
> 3 alone.

To avoid lolz against the shrinker I think you also want to limit 2+3.
Afaiui ttm does that with the global limit, to avoid driving the
system against the wall.

> And you also want to limit the amount of bytes moved between those
> states because each state transition might have a bandwidth cost
> associated with it.
>
> > Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
> > Per Christian's point about removing .start, seems it doesn't need to
> > matter.
>
> You should probably completely drop the idea of this being regions.
>
> And we should also rename them to something like drm_mem_domains to make
> that clear.

+1 on domains. Some of these domains might be physically contiguous
regions, but some clearly arent.

> > Whatever we define for these sub-types, does it make sense for SYSTEM and
> > VRAM to each have them defined?
>
> No, absolutely not. VRAM as well as other private memory types are
> completely driver specific.
>
> > I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
> > configured by driver...  this is a fixed size partition of host memory?
> > Or it is a kind of dummy memory region just for swap implementation?
>
> #1 and #2 in my example above should probably not be configured by the
> driver itself.
>
> And yes seeing those as special for state handling sounds like the
> correct approach to me.

Do we have any hw that wants custom versions of 3? The only hw designs
I know of either have one shared translation table (but only one per
device, so having just 1 domain is good enough). Or TT mappings are in
the per-process pagetables, and then you're defacto unlimited (and
again one domain is good enough). So roughly:

- 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
only account to one of them, never both).
- 3 would be a subgroup of 2, and per device. A bo in group 3 is also
always in group 2.

For VRAM and VRAM-similar things (like stolen system memory, or if you
have VRAM that's somehow split up like with a dual gpu perhaps) I
agree the driver needs to register that. And we just have some
standard flags indicating that "this is kinda like VRAM".
-Daniel

>
> Regards,
> Christian.
>
> > TTM was clearly missing that resulting in a wh

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Am 31.07.19 um 02:51 schrieb Brian Welty:
[SNIP]
>> +/*
>> + * Memory types for drm_mem_region
>> + */
> #define DRM_MEM_SWAP?
 btw what did you have in mind for this? Since we use shmem we kinda don't
 know whether the BO is actually swapped out or not, at least on the i915
 side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>>> Yeah, the problem is not everybody can use shmem. For some use cases you
>>> have to use memory allocated through dma_alloc_coherent().
>>>
>>> So to be able to swap this out you need a separate domain to copy it
>>> from whatever is backing it currently to shmem.
>>>
>>> So we essentially have:
>>> DRM_MEM_SYS_SWAPABLE
>>> DRM_MEM_SYS_NOT_GPU_MAPPED
>>> DRM_MEM_SYS_GPU_MAPPED
>>>
>>> Or something like that.
>> Yeah i915-gem is similar. We oportunistically keep the pages pinned
>> sometimes even if not currently mapped into the (what ttm calls) TT.
>> So I think these three for system memory make sense for us too. I
>> think that's similar (at least in spirit) to the dma_alloc cache you
>> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
>> we could have something like PINNED or so. Although it's not
>> permanently pinned, so maybe that's confusing too.
>>
> Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> was.  The discussion helped clear up several bits of confusion on my part.
>  From proposed names, I find MAPPED and PINNED slightly confusing.
> In terms of backing store description, maybe these are a little better:
>DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
>DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

That's still not correct. Let me describe what each of the tree stands for:

1. The backing store is a shmem file so the individual pages are 
swapable by the core OS.
2. The backing store is allocate GPU accessible but not currently in use 
by the GPU.
3. The backing store is currently in use by the GPU.

For i915 all three of those are basically the same and you only need to 
worry about it much.

But for other drivers that's certainly not true and we need this 
distinction of the backing store of an object.

I'm just not sure how we would handle that for cgroups. From experience 
we certainly want a limit over all 3, but you usually also want to limit 
3 alone.

And you also want to limit the amount of bytes moved between those 
states because each state transition might have a bandwidth cost 
associated with it.

> Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
> Per Christian's point about removing .start, seems it doesn't need to
> matter.

You should probably completely drop the idea of this being regions.

And we should also rename them to something like drm_mem_domains to make 
that clear.

> Whatever we define for these sub-types, does it make sense for SYSTEM and
> VRAM to each have them defined?

No, absolutely not. VRAM as well as other private memory types are 
completely driver specific.

> I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
> configured by driver...  this is a fixed size partition of host memory?
> Or it is a kind of dummy memory region just for swap implementation?

#1 and #2 in my example above should probably not be configured by the 
driver itself.

And yes seeing those as special for state handling sounds like the 
correct approach to me.

Regards,
Christian.

> TTM was clearly missing that resulting in a whole bunch of extra
> handling and rather complicated handling.
>
>> +#define DRM_MEM_SYSTEM 0
>> +#define DRM_MEM_STOLEN 1
> I think we need a better naming for that.
>
> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
> least for TTM this is the system memory currently GPU accessible.
 Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
 translation table window into system memory. Not the same thing at all.
>>> Thought so. The closest I have in mind is GTT, but everything else works
>>> as well.
>> Would your GPU_MAPPED above work for TT? I think we'll also need
>> STOLEN, I'm even hearing noises that there's going to be stolen for
>> discrete vram for us ... Also if we expand I guess we need to teach
>> ttm to cope with more, or maybe treat the DRM one as some kind of
>> sub-flavour.
> Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
> DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
> I suggested above, I'm not sure.
>
> -Brian
>
>
>> -Daniel
>>
>>> Christian.
>>>
 -Daniel

> Thanks for looking into that,
> Christian.
>
> Am 30.07.19 um 02:32 schrieb Brian Welty:
>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>  I fixed the nit with ordering of header includes that Sam noted. ]
>>
>> This RFC series is first implementation of some ideas expressed
>> earlier 

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Brian Welty

On 7/30/2019 2:34 AM, Daniel Vetter wrote:
> On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
>> Yeah, that looks like a good start. Just a couple of random design 
>> comments/requirements.
>>
>> First of all please restructure the changes so that you more or less 
>> have the following:
>> 1. Adding of the new structures and functionality without any change to 
>> existing code.
>> 2. Replacing the existing functionality in TTM and all of its drivers.
>> 3. Replacing the existing functionality in i915.
>>
>> This should make it much easier to review the new functionality when it 
>> is not mixed with existing TTM stuff.

Sure, understood.  But I hope it's fair that I wouldn't be updating all
drivers in an RFC series until there is a bit of clarity/agreement on any
path forward.  But I can include amdgpu patch next time.

>>
>>
>> Second please completely drop the concept of gpu_offset or start of the 
>> memory region like here:
>>> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
>> At least on AMD hardware we have the following address spaces which are 
>> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus 
>> addresses and physical addresses.
>>
>> Pushing a concept of a general GPU address space into the memory 
>> management was a rather bad design mistake in TTM and we should not 
>> repeat that here.
>>
>> A region should only consists of a size in bytes and (internal to the 
>> region manager) allocations in that region.

Got it. I was trying to include fields that seemed relevant to a base
structure and could then optionally be leveraged at the choice of device
driver.  But I see your point.

>>
>>
>> Third please don't use any CPU or architecture specific types in any 
>> data structures:
>>> +struct drm_mem_region {
>>> +   resource_size_t start; /* within GPU physical address space */
>>> +   resource_size_t io_start; /* BAR address (CPU accessible) */
>>> +   resource_size_t size;
>>
>> I knew that resource_size is mostly 64bit on modern architectures, but 
>> dGPUs are completely separate to the architecture and we always need 
>> 64bits here at least for AMD hardware.
>>
>> So this should either be always uint64_t, or something like 
>> gpu_resource_size which depends on what the compiled in drivers require 
>> if we really need that.
>>
>> And by the way: Please always use bytes for things like sizes and not 
>> number of pages, cause page size is again CPU/architecture specific and 
>> GPU drivers don't necessary care about that.

Makes sense,  will fix.

Hmm,  I did hope that at least the DRM cgroup controller could leverage
struct page_counter.  It encapsulates nicely much of the fields for 
managing a memory limit.  But well, this is off topic

>>
>>
>> And here also a few direct comments on the code:
>>> +   union {
>>> +   struct drm_mm *mm;
>>> +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
>>> +   void *priv;
>>> +   };
>> Maybe just always use void *mm here.
> 
> I'd say lets drop this outright, and handle private data by embedding this
> structure in the right place. That's how we handle everything in drm now
> as much as possible, and it's so much cleaner. I think ttm still loves
> priv pointers a bit too much in some places.

Okay, I'll drop it until I might be able to prove this might be useful later.

>
>>> +   spinlock_t move_lock;
>>> +   struct dma_fence *move;
>>
>> That is TTM specific and I'm not sure if we want it in the common memory 
>> management handling.
>>
>> If we want that here we should probably replace the lock with some rcu 
>> and atomic fence pointer exchange first.
> 
> Yeah  not sure we want any of these details in this shared structure
> either.
> 

Thanks for the feedback. I can remove it too.
I was unsure if might be a case for having it in future.

Well, struct drm_mem_region will be quite small then if it only has a
size and type field.
Hardly seems worth introducing a new structure if these are the only fields.
I know we thought it might benefit cgroups controller,  but I still hope to
find earlier purpose it could serve.

-Brian


[snip]

>>
>> Am 30.07.19 um 02:32 schrieb Brian Welty:
>>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>>I fixed the nit with ordering of header includes that Sam noted. ]
>>>
>>> This RFC series is first implementation of some ideas expressed
>>> earlier on dri-devel [1].
>>>
>>> Some of the goals (open for much debate) are:
>>>- Create common base structure (subclass) for memory regions (patch #1)
>>>- Create common memory region types (patch #2)
>>>- Create common set of memory_region function callbacks (based on
>>>  ttm_mem_type_manager_funcs and intel_memory_regions_ops)
>>>- Create common helpers that operate on drm_mem_region to be leveraged
>>>  by both TTM drivers and i915, reducing code duplication
>>>- Above might start with refactoring ttm_bo_manager

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Brian Welty

On 7/30/2019 3:45 AM, Daniel Vetter wrote:
> On Tue, Jul 30, 2019 at 12:24 PM Koenig, Christian
>  wrote:
>>
>> Am 30.07.19 um 11:38 schrieb Daniel Vetter:
>>> On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:

Snipped the feedback on struct drm_mem_region.
Will be easier to have separate thread.


> +/*
> + * Memory types for drm_mem_region
> + */
 #define DRM_MEM_SWAP?
>>> btw what did you have in mind for this? Since we use shmem we kinda don't
>>> know whether the BO is actually swapped out or not, at least on the i915
>>> side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>>
>> Yeah, the problem is not everybody can use shmem. For some use cases you
>> have to use memory allocated through dma_alloc_coherent().
>>
>> So to be able to swap this out you need a separate domain to copy it
>> from whatever is backing it currently to shmem.
>>
>> So we essentially have:
>> DRM_MEM_SYS_SWAPABLE
>> DRM_MEM_SYS_NOT_GPU_MAPPED
>> DRM_MEM_SYS_GPU_MAPPED
>>
>> Or something like that.
> 
> Yeah i915-gem is similar. We oportunistically keep the pages pinned
> sometimes even if not currently mapped into the (what ttm calls) TT.
> So I think these three for system memory make sense for us too. I
> think that's similar (at least in spirit) to the dma_alloc cache you
> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
> we could have something like PINNED or so. Although it's not
> permanently pinned, so maybe that's confusing too.
> 

Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
was.  The discussion helped clear up several bits of confusion on my part.
From proposed names, I find MAPPED and PINNED slightly confusing.
In terms of backing store description, maybe these are a little better:
  DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
  DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
Per Christian's point about removing .start, seems it doesn't need to
matter.

Whatever we define for these sub-types, does it make sense for SYSTEM and
VRAM to each have them defined?

I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
configured by driver...  this is a fixed size partition of host memory?
Or it is a kind of dummy memory region just for swap implementation?


 TTM was clearly missing that resulting in a whole bunch of extra
 handling and rather complicated handling.

> +#define DRM_MEM_SYSTEM 0
> +#define DRM_MEM_STOLEN 1
 I think we need a better naming for that.

 STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
 least for TTM this is the system memory currently GPU accessible.
>>> Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
>>> translation table window into system memory. Not the same thing at all.
>>
>> Thought so. The closest I have in mind is GTT, but everything else works
>> as well.
> 
> Would your GPU_MAPPED above work for TT? I think we'll also need
> STOLEN, I'm even hearing noises that there's going to be stolen for
> discrete vram for us ... Also if we expand I guess we need to teach
> ttm to cope with more, or maybe treat the DRM one as some kind of
> sub-flavour.
Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
I suggested above, I'm not sure.

-Brian


> -Daniel
> 
>>
>> Christian.
>>
>>> -Daniel
>>>

 Thanks for looking into that,
 Christian.

 Am 30.07.19 um 02:32 schrieb Brian Welty:
> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
> I fixed the nit with ordering of header includes that Sam noted. ]
>
> This RFC series is first implementation of some ideas expressed
> earlier on dri-devel [1].
>
> Some of the goals (open for much debate) are:
> - Create common base structure (subclass) for memory regions (patch 
> #1)
> - Create common memory region types (patch #2)
> - Create common set of memory_region function callbacks (based on
>   ttm_mem_type_manager_funcs and intel_memory_regions_ops)
> - Create common helpers that operate on drm_mem_region to be leveraged
>   by both TTM drivers and i915, reducing code duplication
> - Above might start with refactoring ttm_bo_manager.c as these are
>   helpers for using drm_mm's range allocator and could be made to
>   operate on DRM structures instead of TTM ones.
> - Larger goal might be to make LRU management of GEM objects common, 
> and
>   migrate those fields into drm_mem_region and drm_gem_object 
> strucures.
>
> Patches 1-2 implement the proposed struct drm_mem_region and adds
> associated common set of definitions for memory region type.
>
> Patch #3 is update to i91

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Daniel Vetter
On Tue, Jul 30, 2019 at 4:30 PM Michel Dänzer  wrote:
> On 2019-07-30 12:45 p.m., Daniel Vetter wrote:
> > On Tue, Jul 30, 2019 at 12:24 PM Koenig, Christian
> >  wrote:
> >> Am 30.07.19 um 11:38 schrieb Daniel Vetter:
> >>> On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
> >
> > +#define DRM_MEM_SYSTEM 0
> > +#define DRM_MEM_STOLEN 1
>  I think we need a better naming for that.
> 
>  STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
>  least for TTM this is the system memory currently GPU accessible.
> >>> Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
> >>> translation table window into system memory. Not the same thing at all.
> >>
> >> Thought so. The closest I have in mind is GTT, but everything else works
> >> as well.
> >
> > Would your GPU_MAPPED above work for TT? I think we'll also need
> > STOLEN, I'm even hearing noises that there's going to be stolen for
> > discrete vram for us ...
>
> Could i915 use DRM_MEM_PRIV for stolen? Or is there other hardware with
> something similar?

I don't think it matters much what we name it ... _PRIV sounds as good
as anything else. As long as we make it clear that userspace bo also
might end up in there I think it's all good.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Michel Dänzer
On 2019-07-30 12:45 p.m., Daniel Vetter wrote:
> On Tue, Jul 30, 2019 at 12:24 PM Koenig, Christian
>  wrote:
>> Am 30.07.19 um 11:38 schrieb Daniel Vetter:
>>> On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
> 
> +#define DRM_MEM_SYSTEM 0
> +#define DRM_MEM_STOLEN 1
 I think we need a better naming for that.

 STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
 least for TTM this is the system memory currently GPU accessible.
>>> Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
>>> translation table window into system memory. Not the same thing at all.
>>
>> Thought so. The closest I have in mind is GTT, but everything else works
>> as well.
> 
> Would your GPU_MAPPED above work for TT? I think we'll also need
> STOLEN, I'm even hearing noises that there's going to be stolen for
> discrete vram for us ...

Could i915 use DRM_MEM_PRIV for stolen? Or is there other hardware with
something similar?


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Daniel Vetter
On Tue, Jul 30, 2019 at 12:24 PM Koenig, Christian
 wrote:
>
> Am 30.07.19 um 11:38 schrieb Daniel Vetter:
> > On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
> >> Yeah, that looks like a good start. Just a couple of random design
> >> comments/requirements.
> >>
> >> First of all please restructure the changes so that you more or less
> >> have the following:
> >> 1. Adding of the new structures and functionality without any change to
> >> existing code.
> >> 2. Replacing the existing functionality in TTM and all of its drivers.
> >> 3. Replacing the existing functionality in i915.
> >>
> >> This should make it much easier to review the new functionality when it
> >> is not mixed with existing TTM stuff.
> >>
> >>
> >> Second please completely drop the concept of gpu_offset or start of the
> >> memory region like here:
> >>> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
> >> At least on AMD hardware we have the following address spaces which are
> >> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus
> >> addresses and physical addresses.
> >>
> >> Pushing a concept of a general GPU address space into the memory
> >> management was a rather bad design mistake in TTM and we should not
> >> repeat that here.
> >>
> >> A region should only consists of a size in bytes and (internal to the
> >> region manager) allocations in that region.
> >>
> >>
> >> Third please don't use any CPU or architecture specific types in any
> >> data structures:
> >>> +struct drm_mem_region {
> >>> +   resource_size_t start; /* within GPU physical address space */
> >>> +   resource_size_t io_start; /* BAR address (CPU accessible) */
> >>> +   resource_size_t size;
> >> I knew that resource_size is mostly 64bit on modern architectures, but
> >> dGPUs are completely separate to the architecture and we always need
> >> 64bits here at least for AMD hardware.
> >>
> >> So this should either be always uint64_t, or something like
> >> gpu_resource_size which depends on what the compiled in drivers require
> >> if we really need that.
> >>
> >> And by the way: Please always use bytes for things like sizes and not
> >> number of pages, cause page size is again CPU/architecture specific and
> >> GPU drivers don't necessary care about that.
> >>
> >>
> >> And here also a few direct comments on the code:
> >>> +   union {
> >>> +   struct drm_mm *mm;
> >>> +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> >>> +   void *priv;
> >>> +   };
> >> Maybe just always use void *mm here.
> >>
> >>> +   spinlock_t move_lock;
> >>> +   struct dma_fence *move;
> >> That is TTM specific and I'm not sure if we want it in the common memory
> >> management handling.
> >>
> >> If we want that here we should probably replace the lock with some rcu
> >> and atomic fence pointer exchange first.
> >>
> >>> +/*
> >>> + * Memory types for drm_mem_region
> >>> + */
> >> #define DRM_MEM_SWAP?
> > btw what did you have in mind for this? Since we use shmem we kinda don't
> > know whether the BO is actually swapped out or not, at least on the i915
> > side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>
> Yeah, the problem is not everybody can use shmem. For some use cases you
> have to use memory allocated through dma_alloc_coherent().
>
> So to be able to swap this out you need a separate domain to copy it
> from whatever is backing it currently to shmem.
>
> So we essentially have:
> DRM_MEM_SYS_SWAPABLE
> DRM_MEM_SYS_NOT_GPU_MAPPED
> DRM_MEM_SYS_GPU_MAPPED
>
> Or something like that.

Yeah i915-gem is similar. We oportunistically keep the pages pinned
sometimes even if not currently mapped into the (what ttm calls) TT.
So I think these three for system memory make sense for us too. I
think that's similar (at least in spirit) to the dma_alloc cache you
have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
we could have something like PINNED or so. Although it's not
permanently pinned, so maybe that's confusing too.

> >> TTM was clearly missing that resulting in a whole bunch of extra
> >> handling and rather complicated handling.
> >>
> >>> +#define DRM_MEM_SYSTEM 0
> >>> +#define DRM_MEM_STOLEN 1
> >> I think we need a better naming for that.
> >>
> >> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
> >> least for TTM this is the system memory currently GPU accessible.
> > Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
> > translation table window into system memory. Not the same thing at all.
>
> Thought so. The closest I have in mind is GTT, but everything else works
> as well.

Would your GPU_MAPPED above work for TT? I think we'll also need
STOLEN, I'm even hearing noises that there's going to be stolen for
discrete vram for us ... Also if we expand I guess we need to teach
ttm to cope with more, or maybe treat the DRM one as some kind of
sub-flavour.
-Daniel

>
> Chri

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Am 30.07.19 um 11:38 schrieb Daniel Vetter:
> On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
>> Yeah, that looks like a good start. Just a couple of random design
>> comments/requirements.
>>
>> First of all please restructure the changes so that you more or less
>> have the following:
>> 1. Adding of the new structures and functionality without any change to
>> existing code.
>> 2. Replacing the existing functionality in TTM and all of its drivers.
>> 3. Replacing the existing functionality in i915.
>>
>> This should make it much easier to review the new functionality when it
>> is not mixed with existing TTM stuff.
>>
>>
>> Second please completely drop the concept of gpu_offset or start of the
>> memory region like here:
>>> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
>> At least on AMD hardware we have the following address spaces which are
>> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus
>> addresses and physical addresses.
>>
>> Pushing a concept of a general GPU address space into the memory
>> management was a rather bad design mistake in TTM and we should not
>> repeat that here.
>>
>> A region should only consists of a size in bytes and (internal to the
>> region manager) allocations in that region.
>>
>>
>> Third please don't use any CPU or architecture specific types in any
>> data structures:
>>> +struct drm_mem_region {
>>> +   resource_size_t start; /* within GPU physical address space */
>>> +   resource_size_t io_start; /* BAR address (CPU accessible) */
>>> +   resource_size_t size;
>> I knew that resource_size is mostly 64bit on modern architectures, but
>> dGPUs are completely separate to the architecture and we always need
>> 64bits here at least for AMD hardware.
>>
>> So this should either be always uint64_t, or something like
>> gpu_resource_size which depends on what the compiled in drivers require
>> if we really need that.
>>
>> And by the way: Please always use bytes for things like sizes and not
>> number of pages, cause page size is again CPU/architecture specific and
>> GPU drivers don't necessary care about that.
>>
>>
>> And here also a few direct comments on the code:
>>> +   union {
>>> +   struct drm_mm *mm;
>>> +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
>>> +   void *priv;
>>> +   };
>> Maybe just always use void *mm here.
>>
>>> +   spinlock_t move_lock;
>>> +   struct dma_fence *move;
>> That is TTM specific and I'm not sure if we want it in the common memory
>> management handling.
>>
>> If we want that here we should probably replace the lock with some rcu
>> and atomic fence pointer exchange first.
>>
>>> +/*
>>> + * Memory types for drm_mem_region
>>> + */
>> #define DRM_MEM_SWAP    ?
> btw what did you have in mind for this? Since we use shmem we kinda don't
> know whether the BO is actually swapped out or not, at least on the i915
> side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.

Yeah, the problem is not everybody can use shmem. For some use cases you 
have to use memory allocated through dma_alloc_coherent().

So to be able to swap this out you need a separate domain to copy it 
from whatever is backing it currently to shmem.

So we essentially have:
DRM_MEM_SYS_SWAPABLE
DRM_MEM_SYS_NOT_GPU_MAPPED
DRM_MEM_SYS_GPU_MAPPED

Or something like that.

>> TTM was clearly missing that resulting in a whole bunch of extra
>> handling and rather complicated handling.
>>
>>> +#define DRM_MEM_SYSTEM 0
>>> +#define DRM_MEM_STOLEN 1
>> I think we need a better naming for that.
>>
>> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
>> least for TTM this is the system memory currently GPU accessible.
> Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
> translation table window into system memory. Not the same thing at all.

Thought so. The closest I have in mind is GTT, but everything else works 
as well.

Christian.

> -Daniel
>
>>
>> Thanks for looking into that,
>> Christian.
>>
>> Am 30.07.19 um 02:32 schrieb Brian Welty:
>>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>> I fixed the nit with ordering of header includes that Sam noted. ]
>>>
>>> This RFC series is first implementation of some ideas expressed
>>> earlier on dri-devel [1].
>>>
>>> Some of the goals (open for much debate) are:
>>> - Create common base structure (subclass) for memory regions (patch #1)
>>> - Create common memory region types (patch #2)
>>> - Create common set of memory_region function callbacks (based on
>>>   ttm_mem_type_manager_funcs and intel_memory_regions_ops)
>>> - Create common helpers that operate on drm_mem_region to be leveraged
>>>   by both TTM drivers and i915, reducing code duplication
>>> - Above might start with refactoring ttm_bo_manager.c as these are
>>>   helpers for using drm_mm's range allocator and could be made to
>>>  

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Daniel Vetter
On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
> Yeah, that looks like a good start. Just a couple of random design 
> comments/requirements.
> 
> First of all please restructure the changes so that you more or less 
> have the following:
> 1. Adding of the new structures and functionality without any change to 
> existing code.
> 2. Replacing the existing functionality in TTM and all of its drivers.
> 3. Replacing the existing functionality in i915.
> 
> This should make it much easier to review the new functionality when it 
> is not mixed with existing TTM stuff.
> 
> 
> Second please completely drop the concept of gpu_offset or start of the 
> memory region like here:
> > drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
> At least on AMD hardware we have the following address spaces which are 
> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus 
> addresses and physical addresses.
> 
> Pushing a concept of a general GPU address space into the memory 
> management was a rather bad design mistake in TTM and we should not 
> repeat that here.
> 
> A region should only consists of a size in bytes and (internal to the 
> region manager) allocations in that region.
> 
> 
> Third please don't use any CPU or architecture specific types in any 
> data structures:
> > +struct drm_mem_region {
> > +   resource_size_t start; /* within GPU physical address space */
> > +   resource_size_t io_start; /* BAR address (CPU accessible) */
> > +   resource_size_t size;
> 
> I knew that resource_size is mostly 64bit on modern architectures, but 
> dGPUs are completely separate to the architecture and we always need 
> 64bits here at least for AMD hardware.
> 
> So this should either be always uint64_t, or something like 
> gpu_resource_size which depends on what the compiled in drivers require 
> if we really need that.
> 
> And by the way: Please always use bytes for things like sizes and not 
> number of pages, cause page size is again CPU/architecture specific and 
> GPU drivers don't necessary care about that.
> 
> 
> And here also a few direct comments on the code:
> > +   union {
> > +   struct drm_mm *mm;
> > +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> > +   void *priv;
> > +   };
> Maybe just always use void *mm here.
> 
> > +   spinlock_t move_lock;
> > +   struct dma_fence *move;
> 
> That is TTM specific and I'm not sure if we want it in the common memory 
> management handling.
> 
> If we want that here we should probably replace the lock with some rcu 
> and atomic fence pointer exchange first.
> 
> > +/*
> > + * Memory types for drm_mem_region
> > + */
> 
> #define DRM_MEM_SWAP    ?

btw what did you have in mind for this? Since we use shmem we kinda don't
know whether the BO is actually swapped out or not, at least on the i915
side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.

> TTM was clearly missing that resulting in a whole bunch of extra 
> handling and rather complicated handling.
> 
> > +#define DRM_MEM_SYSTEM 0
> > +#define DRM_MEM_STOLEN 1
> 
> I think we need a better naming for that.
> 
> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at 
> least for TTM this is the system memory currently GPU accessible.

Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
translation table window into system memory. Not the same thing at all.
-Daniel

> 
> 
> Thanks for looking into that,
> Christian.
> 
> Am 30.07.19 um 02:32 schrieb Brian Welty:
> > [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
> >I fixed the nit with ordering of header includes that Sam noted. ]
> >
> > This RFC series is first implementation of some ideas expressed
> > earlier on dri-devel [1].
> >
> > Some of the goals (open for much debate) are:
> >- Create common base structure (subclass) for memory regions (patch #1)
> >- Create common memory region types (patch #2)
> >- Create common set of memory_region function callbacks (based on
> >  ttm_mem_type_manager_funcs and intel_memory_regions_ops)
> >- Create common helpers that operate on drm_mem_region to be leveraged
> >  by both TTM drivers and i915, reducing code duplication
> >- Above might start with refactoring ttm_bo_manager.c as these are
> >  helpers for using drm_mm's range allocator and could be made to
> >  operate on DRM structures instead of TTM ones.
> >- Larger goal might be to make LRU management of GEM objects common, and
> >  migrate those fields into drm_mem_region and drm_gem_object strucures.
> >
> > Patches 1-2 implement the proposed struct drm_mem_region and adds
> > associated common set of definitions for memory region type.
> >
> > Patch #3 is update to i915 and is based upon another series which is
> > in progress to add vram support to i915 [2].
> >
> > [1] https://lists.freedesktop.org/archives/dri-devel/2019-June/

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Daniel Vetter
On Tue, Jul 30, 2019 at 08:45:57AM +, Koenig, Christian wrote:
> Yeah, that looks like a good start. Just a couple of random design 
> comments/requirements.
> 
> First of all please restructure the changes so that you more or less 
> have the following:
> 1. Adding of the new structures and functionality without any change to 
> existing code.
> 2. Replacing the existing functionality in TTM and all of its drivers.
> 3. Replacing the existing functionality in i915.
> 
> This should make it much easier to review the new functionality when it 
> is not mixed with existing TTM stuff.
> 
> 
> Second please completely drop the concept of gpu_offset or start of the 
> memory region like here:
> > drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
> At least on AMD hardware we have the following address spaces which are 
> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus 
> addresses and physical addresses.
> 
> Pushing a concept of a general GPU address space into the memory 
> management was a rather bad design mistake in TTM and we should not 
> repeat that here.
> 
> A region should only consists of a size in bytes and (internal to the 
> region manager) allocations in that region.
> 
> 
> Third please don't use any CPU or architecture specific types in any 
> data structures:
> > +struct drm_mem_region {
> > +   resource_size_t start; /* within GPU physical address space */
> > +   resource_size_t io_start; /* BAR address (CPU accessible) */
> > +   resource_size_t size;
> 
> I knew that resource_size is mostly 64bit on modern architectures, but 
> dGPUs are completely separate to the architecture and we always need 
> 64bits here at least for AMD hardware.
> 
> So this should either be always uint64_t, or something like 
> gpu_resource_size which depends on what the compiled in drivers require 
> if we really need that.
> 
> And by the way: Please always use bytes for things like sizes and not 
> number of pages, cause page size is again CPU/architecture specific and 
> GPU drivers don't necessary care about that.
> 
> 
> And here also a few direct comments on the code:
> > +   union {
> > +   struct drm_mm *mm;
> > +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> > +   void *priv;
> > +   };
> Maybe just always use void *mm here.

I'd say lets drop this outright, and handle private data by embedding this
structure in the right place. That's how we handle everything in drm now
as much as possible, and it's so much cleaner. I think ttm still loves
priv pointers a bit too much in some places.

> > +   spinlock_t move_lock;
> > +   struct dma_fence *move;
> 
> That is TTM specific and I'm not sure if we want it in the common memory 
> management handling.
> 
> If we want that here we should probably replace the lock with some rcu 
> and atomic fence pointer exchange first.

Yeah  not sure we want any of these details in this shared structure
either.

> 
> > +/*
> > + * Memory types for drm_mem_region
> > + */
> 
> #define DRM_MEM_SWAP    ?
> 
> TTM was clearly missing that resulting in a whole bunch of extra 
> handling and rather complicated handling.
> 
> > +#define DRM_MEM_SYSTEM 0
> > +#define DRM_MEM_STOLEN 1
> 
> I think we need a better naming for that.
> 
> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at 
> least for TTM this is the system memory currently GPU accessible.

Yeah I think the crux here of having a common drm_mem_region is how do we
name stuff. I think what Brian didn't mention is that the goal here is to
have something we can use for managing using cgroups.

> Thanks for looking into that,
> Christian.
> 
> Am 30.07.19 um 02:32 schrieb Brian Welty:
> > [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
> >I fixed the nit with ordering of header includes that Sam noted. ]
> >
> > This RFC series is first implementation of some ideas expressed
> > earlier on dri-devel [1].
> >
> > Some of the goals (open for much debate) are:
> >- Create common base structure (subclass) for memory regions (patch #1)
> >- Create common memory region types (patch #2)
> >- Create common set of memory_region function callbacks (based on
> >  ttm_mem_type_manager_funcs and intel_memory_regions_ops)
> >- Create common helpers that operate on drm_mem_region to be leveraged
> >  by both TTM drivers and i915, reducing code duplication
> >- Above might start with refactoring ttm_bo_manager.c as these are
> >  helpers for using drm_mm's range allocator and could be made to
> >  operate on DRM structures instead of TTM ones.
> >- Larger goal might be to make LRU management of GEM objects common, and
> >  migrate those fields into drm_mem_region and drm_gem_object strucures.

I'm not sure how much of all that we really want in a drm_mem_region ...
Otherwise we just reimplement the same midlayer we have already, but with
a drm_ instead of ttm_ 

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Yeah, that looks like a good start. Just a couple of random design 
comments/requirements.

First of all please restructure the changes so that you more or less 
have the following:
1. Adding of the new structures and functionality without any change to 
existing code.
2. Replacing the existing functionality in TTM and all of its drivers.
3. Replacing the existing functionality in i915.

This should make it much easier to review the new functionality when it 
is not mixed with existing TTM stuff.


Second please completely drop the concept of gpu_offset or start of the 
memory region like here:
> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
At least on AMD hardware we have the following address spaces which are 
sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus 
addresses and physical addresses.

Pushing a concept of a general GPU address space into the memory 
management was a rather bad design mistake in TTM and we should not 
repeat that here.

A region should only consists of a size in bytes and (internal to the 
region manager) allocations in that region.


Third please don't use any CPU or architecture specific types in any 
data structures:
> +struct drm_mem_region {
> + resource_size_t start; /* within GPU physical address space */
> + resource_size_t io_start; /* BAR address (CPU accessible) */
> + resource_size_t size;

I knew that resource_size is mostly 64bit on modern architectures, but 
dGPUs are completely separate to the architecture and we always need 
64bits here at least for AMD hardware.

So this should either be always uint64_t, or something like 
gpu_resource_size which depends on what the compiled in drivers require 
if we really need that.

And by the way: Please always use bytes for things like sizes and not 
number of pages, cause page size is again CPU/architecture specific and 
GPU drivers don't necessary care about that.


And here also a few direct comments on the code:
> + union {
> + struct drm_mm *mm;
> + /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> + void *priv;
> + };
Maybe just always use void *mm here.

> + spinlock_t move_lock;
> + struct dma_fence *move;

That is TTM specific and I'm not sure if we want it in the common memory 
management handling.

If we want that here we should probably replace the lock with some rcu 
and atomic fence pointer exchange first.

> +/*
> + * Memory types for drm_mem_region
> + */

#define DRM_MEM_SWAP    ?

TTM was clearly missing that resulting in a whole bunch of extra 
handling and rather complicated handling.

> +#define DRM_MEM_SYSTEM   0
> +#define DRM_MEM_STOLEN   1

I think we need a better naming for that.

STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at 
least for TTM this is the system memory currently GPU accessible.


Thanks for looking into that,
Christian.

Am 30.07.19 um 02:32 schrieb Brian Welty:
> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>I fixed the nit with ordering of header includes that Sam noted. ]
>
> This RFC series is first implementation of some ideas expressed
> earlier on dri-devel [1].
>
> Some of the goals (open for much debate) are:
>- Create common base structure (subclass) for memory regions (patch #1)
>- Create common memory region types (patch #2)
>- Create common set of memory_region function callbacks (based on
>  ttm_mem_type_manager_funcs and intel_memory_regions_ops)
>- Create common helpers that operate on drm_mem_region to be leveraged
>  by both TTM drivers and i915, reducing code duplication
>- Above might start with refactoring ttm_bo_manager.c as these are
>  helpers for using drm_mm's range allocator and could be made to
>  operate on DRM structures instead of TTM ones.
>- Larger goal might be to make LRU management of GEM objects common, and
>  migrate those fields into drm_mem_region and drm_gem_object strucures.
>
> Patches 1-2 implement the proposed struct drm_mem_region and adds
> associated common set of definitions for memory region type.
>
> Patch #3 is update to i915 and is based upon another series which is
> in progress to add vram support to i915 [2].
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224501.html
> [2] https://lists.freedesktop.org/archives/intel-gfx/2019-June/203649.html
>
> Brian Welty (3):
>drm: introduce new struct drm_mem_region
>drm: Introduce DRM_MEM defines for specifying type of drm_mem_region
>drm/i915: Update intel_memory_region to use nested drm_mem_region
>
>   drivers/gpu/drm/i915/gem/i915_gem_object.c|  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  2 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c   | 10 ++---
>   drivers/gpu/drm/i915/i915_gpu_error.c |  2 +-
>   drivers/gpu/drm/i915/i915_query.c |  2 +-
>   drivers/gpu/drm/i915/intel_memory_region.c

[RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-29 Thread Brian Welty
[ By request, resending to include amd-gfx + intel-gfx.  Since resending,
  I fixed the nit with ordering of header includes that Sam noted. ]

This RFC series is first implementation of some ideas expressed
earlier on dri-devel [1].

Some of the goals (open for much debate) are:
  - Create common base structure (subclass) for memory regions (patch #1)
  - Create common memory region types (patch #2)
  - Create common set of memory_region function callbacks (based on
ttm_mem_type_manager_funcs and intel_memory_regions_ops)
  - Create common helpers that operate on drm_mem_region to be leveraged
by both TTM drivers and i915, reducing code duplication
  - Above might start with refactoring ttm_bo_manager.c as these are
helpers for using drm_mm's range allocator and could be made to
operate on DRM structures instead of TTM ones.
  - Larger goal might be to make LRU management of GEM objects common, and
migrate those fields into drm_mem_region and drm_gem_object strucures.

Patches 1-2 implement the proposed struct drm_mem_region and adds
associated common set of definitions for memory region type.

Patch #3 is update to i915 and is based upon another series which is
in progress to add vram support to i915 [2].

[1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224501.html
[2] https://lists.freedesktop.org/archives/intel-gfx/2019-June/203649.html

Brian Welty (3):
  drm: introduce new struct drm_mem_region
  drm: Introduce DRM_MEM defines for specifying type of drm_mem_region
  drm/i915: Update intel_memory_region to use nested drm_mem_region

 drivers/gpu/drm/i915/gem/i915_gem_object.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 10 ++---
 drivers/gpu/drm/i915/i915_gpu_error.c |  2 +-
 drivers/gpu/drm/i915/i915_query.c |  2 +-
 drivers/gpu/drm/i915/intel_memory_region.c| 10 +++--
 drivers/gpu/drm/i915/intel_memory_region.h| 19 +++--
 drivers/gpu/drm/i915/intel_region_lmem.c  | 26 ++---
 .../drm/i915/selftests/intel_memory_region.c  |  8 ++--
 drivers/gpu/drm/ttm/ttm_bo.c  | 34 +---
 drivers/gpu/drm/ttm/ttm_bo_manager.c  | 14 +++
 drivers/gpu/drm/ttm/ttm_bo_util.c | 11 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c |  8 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c|  4 +-
 include/drm/drm_mm.h  | 39 ++-
 include/drm/ttm/ttm_bo_api.h  |  2 +-
 include/drm/ttm/ttm_bo_driver.h   | 16 
 include/drm/ttm/ttm_placement.h   |  8 ++--
 18 files changed, 124 insertions(+), 93 deletions(-)

-- 
2.21.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx