Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-16 Thread Koenig, Christian
Am 16.04.19 um 13:03 schrieb Daniel Vetter:
> On Tue, Apr 16, 2019 at 12:05 PM Koenig, Christian
>  wrote:
>> Am 15.04.19 um 21:17 schrieb Daniel Vetter:
>>> On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann  
>>> wrote:
 Hi

 Am 15.04.19 um 17:54 schrieb Daniel Vetter:
> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
>> [SNIP]
>>> I'd expect the same applies to the vbox driver.
>>>
>>> Dunno about the other drm drivers and the fbdev drivers you plan to
>>> migrate over.
>> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
>> for a server. It's similar with mgag200 HW. The old fbdev-supported
>> device are all somewhere in the range between cirrus and bochs. Some
>> drivers would probably benefit from the cirrus approach, some could use
>> VRAM directly.
> I think for dumb scanout with vram all we need is:
> - pin framebuffers, which potentially moves the underlying bo into vram
> - unpin framebuffers (which is just accounting, we don't want to move the
> bo on every flip!)
> - if a pin doesn't find enough space, move one of the unpinned bo still
> resident in vram out
 For dumb buffers, I'd expect userspace to have a working set of only a
 front and back buffer (plus maybe a third one). This working set has to
 reside in VRAM for performance reasons; non-WS BOs from other userspace
 programs don't have to be.

 So we could simplify even more: if there's not enough free space in
 vram, remove all unpinned BO's. This would avoid the need to implement
 an LRU algorithm or another eviction strategy. Userspace with a WS
 larger than the absolute VRAM would see degraded performance but
 otherwise still work.
>>> You still need a list of unpinned bo, and the lru scan algorithm is
>>> just a few lines of code more than unpinning everything. Plus it'd be
>>> a neat example of the drm_mm scan logic. Given that some folks might
>>> think that not having lru evict si a problem and get them to type
>>> their own, I'd just add it. But up to you. Plus with ttm you get it no
>>> matter what.
>> Well how about making an drm_lru component which just does the following
>> (and nothing else, please :):
>>
>> 1. Keep a list of objects and a spinlock protecting the list.
>>
>> 2. Offer helpers for adding/deleting/moving stuff from the list.
>>
>> 3. Offer a functionality to do the necessary dance of picking the first
>> entry where we can trylock it's reservation object.
>>
>> 4. Offer bulk move functionality similar to what TTM does at the moment
>> (can be implemented later on).
> At a basic level, this is list_head of drm_gem_object. Not sure that's
> all that useful (outside of the fairly simplistic vram helpers we're
> discussing here). Reasons for that is that there's a lot of trickery
> in selecting which is the best object to pick in any given case (e.g.
> do you want to use drm_mm scanning, or is there a slab of objects you
> prefer to throw out because that avoids. Given that I'm not sure
> implementing the entire scanning/drm_lru logic is beneficial.
>
> The magic trylock+kref_get_unless_zero otoh could be worth
> implementing as a helper, together with a note about how to build your
> own custom lru algorithm. Same for some bulk/nonblocking movement
> helpers maybe. Both not really needed for the dumb scanout vram
> helpers we're discussing here.

Yeah, exactly that's what I wanted to get towards as well.

This magic trylock+kref_get is what needs to be handled correctly by all 
drivers which implement an LRU.

LRU bulk move is something which is tricky to get right as well, but so 
far only amdgpu uses it so it only make sense to share when somebody 
else wants the same approach.

Christian.

> -Daniel
>
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-16 Thread Daniel Vetter
On Tue, Apr 16, 2019 at 12:05 PM Koenig, Christian
 wrote:
>
> Am 15.04.19 um 21:17 schrieb Daniel Vetter:
> > On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann  
> > wrote:
> >> Hi
> >>
> >> Am 15.04.19 um 17:54 schrieb Daniel Vetter:
> >>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
>  Hi
> 
>  Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
>  [SNIP]
> > I'd expect the same applies to the vbox driver.
> >
> > Dunno about the other drm drivers and the fbdev drivers you plan to
> > migrate over.
>  The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
>  for a server. It's similar with mgag200 HW. The old fbdev-supported
>  device are all somewhere in the range between cirrus and bochs. Some
>  drivers would probably benefit from the cirrus approach, some could use
>  VRAM directly.
> >>> I think for dumb scanout with vram all we need is:
> >>> - pin framebuffers, which potentially moves the underlying bo into vram
> >>> - unpin framebuffers (which is just accounting, we don't want to move the
> >>>bo on every flip!)
> >>> - if a pin doesn't find enough space, move one of the unpinned bo still
> >>>resident in vram out
> >> For dumb buffers, I'd expect userspace to have a working set of only a
> >> front and back buffer (plus maybe a third one). This working set has to
> >> reside in VRAM for performance reasons; non-WS BOs from other userspace
> >> programs don't have to be.
> >>
> >> So we could simplify even more: if there's not enough free space in
> >> vram, remove all unpinned BO's. This would avoid the need to implement
> >> an LRU algorithm or another eviction strategy. Userspace with a WS
> >> larger than the absolute VRAM would see degraded performance but
> >> otherwise still work.
> > You still need a list of unpinned bo, and the lru scan algorithm is
> > just a few lines of code more than unpinning everything. Plus it'd be
> > a neat example of the drm_mm scan logic. Given that some folks might
> > think that not having lru evict si a problem and get them to type
> > their own, I'd just add it. But up to you. Plus with ttm you get it no
> > matter what.
>
> Well how about making an drm_lru component which just does the following
> (and nothing else, please :):
>
> 1. Keep a list of objects and a spinlock protecting the list.
>
> 2. Offer helpers for adding/deleting/moving stuff from the list.
>
> 3. Offer a functionality to do the necessary dance of picking the first
> entry where we can trylock it's reservation object.
>
> 4. Offer bulk move functionality similar to what TTM does at the moment
> (can be implemented later on).

At a basic level, this is list_head of drm_gem_object. Not sure that's
all that useful (outside of the fairly simplistic vram helpers we're
discussing here). Reasons for that is that there's a lot of trickery
in selecting which is the best object to pick in any given case (e.g.
do you want to use drm_mm scanning, or is there a slab of objects you
prefer to throw out because that avoids. Given that I'm not sure
implementing the entire scanning/drm_lru logic is beneficial.

The magic trylock+kref_get_unless_zero otoh could be worth
implementing as a helper, together with a note about how to build your
own custom lru algorithm. Same for some bulk/nonblocking movement
helpers maybe. Both not really needed for the dumb scanout vram
helpers we're discussing here.
-Daniel

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-16 Thread Koenig, Christian
Am 15.04.19 um 21:17 schrieb Daniel Vetter:
> On Mon, Apr 15, 2019 at 6:21 PM Thomas Zimmermann  wrote:
>> Hi
>>
>> Am 15.04.19 um 17:54 schrieb Daniel Vetter:
>>> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
 Hi

 Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
 [SNIP]
> I'd expect the same applies to the vbox driver.
>
> Dunno about the other drm drivers and the fbdev drivers you plan to
> migrate over.
 The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
 for a server. It's similar with mgag200 HW. The old fbdev-supported
 device are all somewhere in the range between cirrus and bochs. Some
 drivers would probably benefit from the cirrus approach, some could use
 VRAM directly.
>>> I think for dumb scanout with vram all we need is:
>>> - pin framebuffers, which potentially moves the underlying bo into vram
>>> - unpin framebuffers (which is just accounting, we don't want to move the
>>>bo on every flip!)
>>> - if a pin doesn't find enough space, move one of the unpinned bo still
>>>resident in vram out
>> For dumb buffers, I'd expect userspace to have a working set of only a
>> front and back buffer (plus maybe a third one). This working set has to
>> reside in VRAM for performance reasons; non-WS BOs from other userspace
>> programs don't have to be.
>>
>> So we could simplify even more: if there's not enough free space in
>> vram, remove all unpinned BO's. This would avoid the need to implement
>> an LRU algorithm or another eviction strategy. Userspace with a WS
>> larger than the absolute VRAM would see degraded performance but
>> otherwise still work.
> You still need a list of unpinned bo, and the lru scan algorithm is
> just a few lines of code more than unpinning everything. Plus it'd be
> a neat example of the drm_mm scan logic. Given that some folks might
> think that not having lru evict si a problem and get them to type
> their own, I'd just add it. But up to you. Plus with ttm you get it no
> matter what.

Well how about making an drm_lru component which just does the following 
(and nothing else, please :):

1. Keep a list of objects and a spinlock protecting the list.

2. Offer helpers for adding/deleting/moving stuff from the list.

3. Offer a functionality to do the necessary dance of picking the first 
entry where we can trylock it's reservation object.

4. Offer bulk move functionality similar to what TTM does at the moment 
(can be implemented later on).

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-15 Thread Thomas Zimmermann
Hi

Am 15.04.19 um 17:54 schrieb Daniel Vetter:
> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
>>>   Hi,
>>>
 If not for TTM, what would be the alternative? One VMA manager per
 memory region per device?
>>>
>>> Depends pretty much on the device.
>>>
>>> The cirrus is a display device with only 4 MB of vram.  You can't fit
>>> much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
>>> of the video memory already.  Which is why the cirrus driver (before the
>>> rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
>>> is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
>>> vram any more.  gem objects are managed with the shmem helpers instead
>>> and the active framebuffer is blitted to vram.
>>>
>>> The qemu stdvga (bochs driver) has 16 MB vram by default and can be
>>> configured to have up to 256 MB.  Plenty of room even for multiple 4k
>>> framebuffers if needed.  So for the bochs driver all the ttm bo
>>> migration logic is not needed, it could just store everything in vram.
>>>
>>> A set of drm_gem_vram_* helpers would do the job for bochs.
>>
>> Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple TTM)
>> is probably a better name for the data structures.
> 
> +1 on drm_gem_vram_* naming convention - we want to describe what it's
> for, not how it's implemented.

OK, great.

>>> I'd expect the same applies to the vbox driver.
>>>
>>> Dunno about the other drm drivers and the fbdev drivers you plan to
>>> migrate over.
>>
>> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
>> for a server. It's similar with mgag200 HW. The old fbdev-supported
>> device are all somewhere in the range between cirrus and bochs. Some
>> drivers would probably benefit from the cirrus approach, some could use
>> VRAM directly.
> 
> I think for dumb scanout with vram all we need is:
> - pin framebuffers, which potentially moves the underlying bo into vram
> - unpin framebuffers (which is just accounting, we don't want to move the
>   bo on every flip!)
> - if a pin doesn't find enough space, move one of the unpinned bo still
>   resident in vram out

For dumb buffers, I'd expect userspace to have a working set of only a
front and back buffer (plus maybe a third one). This working set has to
reside in VRAM for performance reasons; non-WS BOs from other userspace
programs don't have to be.

So we could simplify even more: if there's not enough free space in
vram, remove all unpinned BO's. This would avoid the need to implement
an LRU algorithm or another eviction strategy. Userspace with a WS
larger than the absolute VRAM would see degraded performance but
otherwise still work.

Best regards
Thomas

> - no pipelining, no support for dma engines (it's all cpu copies anway)
> - a simple drm_mm should be good enough to manage the vram, no need for
>   the ttm style abstraction over how memory is manged
> - also just bake in the lru eviction list and algorithm
> - probably good to have built-in support for redirecting the mmap between
>   shmem and iomem.
> - anything that needs pipelining or copy engines would be out of scope for
>   these helpers
> 
> I think for starting points we can go with a copypasted version of the
> various ttm implementations we already have, and then simplify from there
> as needed. Or just start fresh if that's too complicated, due to the issue
> Christian highlighted.
> -Daniel
> 
>> Best regards
>> Thomas
>>
>>>
>>> cheers,
>>>   Gerd
>>>
>>> [1] Note: The page-flip migration logic is present in some of the other
>>> drivers too, not sure whenever they actually need that due to being low
>>> on vram too or whenever they just copied the old cirrus code ...
>>>
>>> [2] The other reason is that this allow to convert formats at blit time,
>>> which helps to deal with some cirrus hardware limitations.
>>>
>>
>> -- 
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
>> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
>> HRB 21284 (AG Nürnberg)
>>
> 
> 
> 
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-15 Thread Daniel Vetter
On Mon, Apr 15, 2019 at 05:54:30PM +0200, Daniel Vetter wrote:
> On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
> > >   Hi,
> > > 
> > >> If not for TTM, what would be the alternative? One VMA manager per
> > >> memory region per device?
> > > 
> > > Depends pretty much on the device.
> > > 
> > > The cirrus is a display device with only 4 MB of vram.  You can't fit
> > > much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
> > > of the video memory already.  Which is why the cirrus driver (before the
> > > rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
> > > is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
> > > vram any more.  gem objects are managed with the shmem helpers instead
> > > and the active framebuffer is blitted to vram.
> > > 
> > > The qemu stdvga (bochs driver) has 16 MB vram by default and can be
> > > configured to have up to 256 MB.  Plenty of room even for multiple 4k
> > > framebuffers if needed.  So for the bochs driver all the ttm bo
> > > migration logic is not needed, it could just store everything in vram.
> > > 
> > > A set of drm_gem_vram_* helpers would do the job for bochs.
> > 
> > Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple TTM)
> > is probably a better name for the data structures.
> 
> +1 on drm_gem_vram_* naming convention - we want to describe what it's
> for, not how it's implemented.
> 
> > > I'd expect the same applies to the vbox driver.
> > > 
> > > Dunno about the other drm drivers and the fbdev drivers you plan to
> > > migrate over.
> > 
> > The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
> > for a server. It's similar with mgag200 HW. The old fbdev-supported
> > device are all somewhere in the range between cirrus and bochs. Some
> > drivers would probably benefit from the cirrus approach, some could use
> > VRAM directly.
> 
> I think for dumb scanout with vram all we need is:
> - pin framebuffers, which potentially moves the underlying bo into vram
> - unpin framebuffers (which is just accounting, we don't want to move the
>   bo on every flip!)
> - if a pin doesn't find enough space, move one of the unpinned bo still
>   resident in vram out
> - no pipelining, no support for dma engines (it's all cpu copies anway)
> - a simple drm_mm should be good enough to manage the vram, no need for
>   the ttm style abstraction over how memory is manged
> - also just bake in the lru eviction list and algorithm
> - probably good to have built-in support for redirecting the mmap between
>   shmem and iomem.
> - anything that needs pipelining or copy engines would be out of scope for
>   these helpers

One more:

- Only bother supporting hw that needs contiguous scanout buffers in VRAM.
  I think hw that has pagetables for scanout (and everything else really)
  is sufficiently different that cramming them into the same helpers
  doesn't make much sense: You want to manage your VRAM with a buddy
  allocator at a page level, plus you need to manage your pagetables, and
  all is likely very hw specific anyway. Plus I haven't seen such hw that
  doesn't come with a real gpu attached :-)

-Daniel

> 
> I think for starting points we can go with a copypasted version of the
> various ttm implementations we already have, and then simplify from there
> as needed. Or just start fresh if that's too complicated, due to the issue
> Christian highlighted.
> -Daniel
> 
> > Best regards
> > Thomas
> > 
> > > 
> > > cheers,
> > >   Gerd
> > > 
> > > [1] Note: The page-flip migration logic is present in some of the other
> > > drivers too, not sure whenever they actually need that due to being 
> > > low
> > > on vram too or whenever they just copied the old cirrus code ...
> > > 
> > > [2] The other reason is that this allow to convert formats at blit time,
> > > which helps to deal with some cirrus hardware limitations.
> > > 
> > 
> > -- 
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> > HRB 21284 (AG Nürnberg)
> > 
> 
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-15 Thread Daniel Vetter
On Tue, Apr 09, 2019 at 09:50:40AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
> >   Hi,
> > 
> >> If not for TTM, what would be the alternative? One VMA manager per
> >> memory region per device?
> > 
> > Depends pretty much on the device.
> > 
> > The cirrus is a display device with only 4 MB of vram.  You can't fit
> > much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
> > of the video memory already.  Which is why the cirrus driver (before the
> > rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
> > is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
> > vram any more.  gem objects are managed with the shmem helpers instead
> > and the active framebuffer is blitted to vram.
> > 
> > The qemu stdvga (bochs driver) has 16 MB vram by default and can be
> > configured to have up to 256 MB.  Plenty of room even for multiple 4k
> > framebuffers if needed.  So for the bochs driver all the ttm bo
> > migration logic is not needed, it could just store everything in vram.
> > 
> > A set of drm_gem_vram_* helpers would do the job for bochs.
> 
> Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple TTM)
> is probably a better name for the data structures.

+1 on drm_gem_vram_* naming convention - we want to describe what it's
for, not how it's implemented.

> > I'd expect the same applies to the vbox driver.
> > 
> > Dunno about the other drm drivers and the fbdev drivers you plan to
> > migrate over.
> 
> The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
> for a server. It's similar with mgag200 HW. The old fbdev-supported
> device are all somewhere in the range between cirrus and bochs. Some
> drivers would probably benefit from the cirrus approach, some could use
> VRAM directly.

I think for dumb scanout with vram all we need is:
- pin framebuffers, which potentially moves the underlying bo into vram
- unpin framebuffers (which is just accounting, we don't want to move the
  bo on every flip!)
- if a pin doesn't find enough space, move one of the unpinned bo still
  resident in vram out
- no pipelining, no support for dma engines (it's all cpu copies anway)
- a simple drm_mm should be good enough to manage the vram, no need for
  the ttm style abstraction over how memory is manged
- also just bake in the lru eviction list and algorithm
- probably good to have built-in support for redirecting the mmap between
  shmem and iomem.
- anything that needs pipelining or copy engines would be out of scope for
  these helpers

I think for starting points we can go with a copypasted version of the
various ttm implementations we already have, and then simplify from there
as needed. Or just start fresh if that's too complicated, due to the issue
Christian highlighted.
-Daniel

> Best regards
> Thomas
> 
> > 
> > cheers,
> >   Gerd
> > 
> > [1] Note: The page-flip migration logic is present in some of the other
> > drivers too, not sure whenever they actually need that due to being low
> > on vram too or whenever they just copied the old cirrus code ...
> > 
> > [2] The other reason is that this allow to convert formats at blit time,
> > which helps to deal with some cirrus hardware limitations.
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG Nürnberg)
> 




-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread Koenig, Christian
Am 08.04.19 um 13:59 schrieb Thomas Zimmermann:
[SNIP]
> If not for TTM, what would be the alternative? One VMA manager per
> memory region per device?

Since everybody vital seems to be on this mail thread anyway, let's use 
it a bit for brain storming what a possible replacement for TTM should 
look like.

Well for simple drivers like qemu/bochs and cirrus the answer is to not 
use it at all. E.g. VRAM is only used for scanout and unprivileged 
userspace should not mess with it at all. In this case we don't need 
dynamic eviction and so also don't need TTM.

That leaves us with the more complex drivers, like radeon, amdgpu, 
nouveu and maybe some of the ARM based stuff, with vmwgfx being a bit 
special here.

Now I can summarize the requirements for at least the amdgpu in the 
following way:
1. We need to be able to allocate memory objects in different locations.
2. We need to be able to move around memory objects between different 
locations.
3. We need some LRU component which tells us what to evict when memory 
in a location becomes to tight.

Now for lessons learned we should at least avoid the following design 
pitfalls:
A) DON'T make it a layered design! Layers are for making cake, not software.

B) DON'T make it a "Eierlegende Wollmilchsau" (German saying). E.g. 
don't try to solve every single corner cases in one piece of software.
     Let's make components which solve one specific problem.

C) Pipeline everything! E.g. the hardware we deal with is asynchronous 
by design. Blocking for the hardware to finish in the common components 
itself is an absolutely no-go.
     If a driver wants to do something synchronous it should wait itself.

Those comments where not really intended for you Thomas, but I had to 
write them down somewhere :)

Regards,
Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread Christian König

Am 09.04.19 um 10:29 schrieb kra...@redhat.com:

   Hi,


The qemu stdvga (bochs driver) has 16 MB vram by default and can be
configured to have up to 256 MB.  Plenty of room even for multiple 4k
framebuffers if needed.  So for the bochs driver all the ttm bo
migration logic is not needed, it could just store everything in vram.

To clarify I assume you mean it doesn't need the migrate each bo
logic, but it still needs the when VRAM fills up migrate stuff logic.

I think even the "when vram fills up" logic isn't that important.  The
driver has no acceleration so there is nothing to store beside dumb
framebuffers, and the vram size can easily be increased if needed.


Yeah, cause in this particular case it is not even real VRAM.

In this case VRAM is backed by system memory which in turn is stolen 
from the host system isn't it?


So just have enough for a framebuffer and don't place anything else in 
there.


Regards,
Christian.



cheers,
   Gerd

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread kra...@redhat.com
  Hi,

> > The qemu stdvga (bochs driver) has 16 MB vram by default and can be
> > configured to have up to 256 MB.  Plenty of room even for multiple 4k
> > framebuffers if needed.  So for the bochs driver all the ttm bo
> > migration logic is not needed, it could just store everything in vram.
> 
> To clarify I assume you mean it doesn't need the migrate each bo
> logic, but it still needs the when VRAM fills up migrate stuff logic.

I think even the "when vram fills up" logic isn't that important.  The
driver has no acceleration so there is nothing to store beside dumb
framebuffers, and the vram size can easily be increased if needed.

cheers,
  Gerd

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread Thomas Zimmermann
Hi

Am 09.04.19 um 09:12 schrieb kra...@redhat.com:
>   Hi,
> 
>> If not for TTM, what would be the alternative? One VMA manager per
>> memory region per device?
> 
> Depends pretty much on the device.
> 
> The cirrus is a display device with only 4 MB of vram.  You can't fit
> much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
> of the video memory already.  Which is why the cirrus driver (before the
> rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
> is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
> vram any more.  gem objects are managed with the shmem helpers instead
> and the active framebuffer is blitted to vram.
> 
> The qemu stdvga (bochs driver) has 16 MB vram by default and can be
> configured to have up to 256 MB.  Plenty of room even for multiple 4k
> framebuffers if needed.  So for the bochs driver all the ttm bo
> migration logic is not needed, it could just store everything in vram.
> 
> A set of drm_gem_vram_* helpers would do the job for bochs.

Thanks for clarifying. drm_gem_vram_* (and drm_vram_mm for Simple TTM)
is probably a better name for the data structures.

> I'd expect the same applies to the vbox driver.
> 
> Dunno about the other drm drivers and the fbdev drivers you plan to
> migrate over.

The AST HW can support up to 512 MiB, but 32-64 MiB seems more realistic
for a server. It's similar with mgag200 HW. The old fbdev-supported
device are all somewhere in the range between cirrus and bochs. Some
drivers would probably benefit from the cirrus approach, some could use
VRAM directly.

Best regards
Thomas

> 
> cheers,
>   Gerd
> 
> [1] Note: The page-flip migration logic is present in some of the other
> drivers too, not sure whenever they actually need that due to being low
> on vram too or whenever they just copied the old cirrus code ...
> 
> [2] The other reason is that this allow to convert formats at blit time,
> which helps to deal with some cirrus hardware limitations.
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread Dave Airlie
On Tue, 9 Apr 2019 at 17:12, kra...@redhat.com  wrote:
>
>   Hi,
>
> > If not for TTM, what would be the alternative? One VMA manager per
> > memory region per device?
>
> Depends pretty much on the device.
>
> The cirrus is a display device with only 4 MB of vram.  You can't fit
> much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
> of the video memory already.  Which is why the cirrus driver (before the
> rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
> is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
> vram any more.  gem objects are managed with the shmem helpers instead
> and the active framebuffer is blitted to vram.
>
> The qemu stdvga (bochs driver) has 16 MB vram by default and can be
> configured to have up to 256 MB.  Plenty of room even for multiple 4k
> framebuffers if needed.  So for the bochs driver all the ttm bo
> migration logic is not needed, it could just store everything in vram.

To clarify I assume you mean it doesn't need the migrate each bo
logic, but it still needs the when VRAM fills up migrate stuff logic.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-09 Thread kra...@redhat.com
  Hi,

> If not for TTM, what would be the alternative? One VMA manager per
> memory region per device?

Depends pretty much on the device.

The cirrus is a display device with only 4 MB of vram.  You can't fit
much in there.  A single 1024x768 @ 24bpp framebuffer needs more 50%
of the video memory already.  Which is why the cirrus driver (before the
rewrite) had to migrate buffers from/to vram on every page flip[1].  Which
is one[2] of the reasons why cirrus (after rewrite) doesn't ttm-manage the
vram any more.  gem objects are managed with the shmem helpers instead
and the active framebuffer is blitted to vram.

The qemu stdvga (bochs driver) has 16 MB vram by default and can be
configured to have up to 256 MB.  Plenty of room even for multiple 4k
framebuffers if needed.  So for the bochs driver all the ttm bo
migration logic is not needed, it could just store everything in vram.

A set of drm_gem_vram_* helpers would do the job for bochs.

I'd expect the same applies to the vbox driver.

Dunno about the other drm drivers and the fbdev drivers you plan to
migrate over.

cheers,
  Gerd

[1] Note: The page-flip migration logic is present in some of the other
drivers too, not sure whenever they actually need that due to being low
on vram too or whenever they just copied the old cirrus code ...

[2] The other reason is that this allow to convert formats at blit time,
which helps to deal with some cirrus hardware limitations.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-08 Thread Thomas Zimmermann
Hi

Am 08.04.19 um 13:10 schrieb Koenig, Christian:
> Well first problem is I'm not sure if that is a good idea. Essentially 
> we want to get rid of TTM in the long run.
> 
> On the other hand this work might aid with that goal, so it might be 
> worth a try.

I see. I'm actually interested in porting some of the fbdev driver code
over to DRM. So far, that patch set uses TTM. As there's so much
duplicated code among drivers, I was asked to merge some of the code
into helpers first.

If not for TTM, what would be the alternative? One VMA manager per
memory region per device?

With the patch set applied there are very few TTM calls left in the
drivers and some can even be replaced by helpers. Besides the stronger
cleanup, maybe we can merge these helpers under a different API name?
Something that does not imply a TTM-based implementation? Doing so would
ease a later rewrite with non-TTM code. Suggestions?

> Second is that this might actually not work of hand. The problem is here:
>> +/* TODO: This test used to be performed by drivers, but can
>> + * this actually happen? If so, should we put the check into
>> + * drm_gem_ttm_of_gem()? */
>> +if (!drm_is_gem_ttm(bo))
>> +return;
> 
> Yeah, this test is mandatory because TTM on itself can present buffer 
> object which doesn't belong to the driver called.
> 
> E.g. we sometimes have BOs which don't belong to the current drivers on 
> a driver specific LRU. A totally brain dead  design if you ask me, but 
> that's how it is.
> 
> Not 100% sure, but by converting all drivers to use a common GEM_TTM 
> backend you might actually break that test.
> 
> I'm not sure if that is actually a problem in the real world, it most 
> likely isn't. But I still won't bet on it without being able to test this.

That's for explaining. So this test should best live in driver-specific
code.

Best regards
Thomas

> Regards,
> Christian.
> 
> Am 08.04.19 um 11:21 schrieb Thomas Zimmermann:
>> Several simple framebuffer drivers copy most of the TTM code from each
>> other. The implementation is always the same; except for the name of
>> some data structures.
>>
>> As recently discussed, this patch set provides generic TTM memory-
>> management code for framebuffers with dedicated video memory. It further
>> converts the respective drivers to the generic code. The shared code
>> is basically the same implementation as the one copied among individual
>> drivers.
>>
>> The patch set contains two major changes: first, it introduces
>> |struct drm_gem_ttm_object| and helpers. It's a GEM object that is
>> backed by TTM-managed memory. The type's purpose is somewhat similar
>> to |struct drm_gem_{cma, shmem}_object|. Second, the patch set
>> introduces |struct drm_simple_ttm| (for the lack of a better name) and
>> helpers. It's an implementation of a basic TTM-based memory manager.
>>
>> Both, GEM TTM and Simple TTM, support VRAM and SYSTEM placements. Support
>> for TT could probably be added if necessary. Both can be used independedly
>> from each other if desired by the DRM driver.
>>
>> Currently ast, bochs, mgag200, vboxvideo and hisilicon/hibmc can use
>> these helpers. Cirrus would also be a candidate, but as it's being
>> rewrtten from scratch, I didn't bother doing the conversion.
>>
>> Future directions: with these changes, the respective drivers can also
>> share some of their mode-setting or fbdev code. GEM TTM could implement
>> PRIME helpers, which would allow for using the generic fbcon.
>>
>> The patch set is against a recent drm-tip.
>>
>> Thomas Zimmermann (15):
>>drm: Add |struct drm_gem_ttm_object| and helpers
>>drm: Add |struct drm_gem_ttm_object| callbacks for |struct
>>  ttm_bo_driver|
>>drm: Add |struct drm_gem_ttm_object| callbacks for |struct drm_driver|
>>drm: Add drm_gem_ttm_fill_create_dumb() to create dumb buffers
>>drm: Add Simple TTM, a memory manager for dedicated VRAM
>>drm/ast: Convert AST driver to |struct drm_gem_ttm_object|
>>drm/ast: Convert AST driver to Simple TTM
>>drm/bochs: Convert Bochs driver to |struct drm_gem_ttm_object|
>>drm/bochs: Convert Bochs driver to Simple TTM
>>drm/mgag200: Convert mgag200 driver to |struct drm_gem_ttm_object|
>>drm/mgag200: Convert mgag200 driver to Simple TTM
>>drm/vboxvideo: Convert vboxvideo driver to |struct drm_gem_ttm_object|
>>drm/vboxvideo: Convert vboxvideo driver to Simple TTM
>>drm/hisilicon: Convert hibmc-drm driver to |struct drm_gem_ttm_object|
>>drm/hisilicon: Convert hibmc-drm driver to Simple TTM
>>
>>   Documentation/gpu/drm-mm.rst  |  23 +
>>   drivers/gpu/drm/Kconfig   |  20 +
>>   drivers/gpu/drm/Makefile  |   5 +
>>   drivers/gpu/drm/ast/Kconfig   |   3 +-
>>   drivers/gpu/drm/ast/ast_drv.c |   4 +-
>>   drivers/gpu/drm/ast/ast_drv.h |  58 +-
>>   drivers/gpu/drm/ast/ast_fb.c  |  18 

Re: [PATCH 00/15] Share TTM code among framebuffer drivers

2019-04-08 Thread Koenig, Christian
Well first problem is I'm not sure if that is a good idea. Essentially 
we want to get rid of TTM in the long run.

On the other hand this work might aid with that goal, so it might be 
worth a try.

Second is that this might actually not work of hand. The problem is here:
> + /* TODO: This test used to be performed by drivers, but can
> +  * this actually happen? If so, should we put the check into
> +  * drm_gem_ttm_of_gem()? */
> + if (!drm_is_gem_ttm(bo))
> + return;

Yeah, this test is mandatory because TTM on itself can present buffer 
object which doesn't belong to the driver called.

E.g. we sometimes have BOs which don't belong to the current drivers on 
a driver specific LRU. A totally brain dead  design if you ask me, but 
that's how it is.

Not 100% sure, but by converting all drivers to use a common GEM_TTM 
backend you might actually break that test.

I'm not sure if that is actually a problem in the real world, it most 
likely isn't. But I still won't bet on it without being able to test this.

Regards,
Christian.

Am 08.04.19 um 11:21 schrieb Thomas Zimmermann:
> Several simple framebuffer drivers copy most of the TTM code from each
> other. The implementation is always the same; except for the name of
> some data structures.
>
> As recently discussed, this patch set provides generic TTM memory-
> management code for framebuffers with dedicated video memory. It further
> converts the respective drivers to the generic code. The shared code
> is basically the same implementation as the one copied among individual
> drivers.
>
> The patch set contains two major changes: first, it introduces
> |struct drm_gem_ttm_object| and helpers. It's a GEM object that is
> backed by TTM-managed memory. The type's purpose is somewhat similar
> to |struct drm_gem_{cma, shmem}_object|. Second, the patch set
> introduces |struct drm_simple_ttm| (for the lack of a better name) and
> helpers. It's an implementation of a basic TTM-based memory manager.
>
> Both, GEM TTM and Simple TTM, support VRAM and SYSTEM placements. Support
> for TT could probably be added if necessary. Both can be used independedly
> from each other if desired by the DRM driver.
>
> Currently ast, bochs, mgag200, vboxvideo and hisilicon/hibmc can use
> these helpers. Cirrus would also be a candidate, but as it's being
> rewrtten from scratch, I didn't bother doing the conversion.
>
> Future directions: with these changes, the respective drivers can also
> share some of their mode-setting or fbdev code. GEM TTM could implement
> PRIME helpers, which would allow for using the generic fbcon.
>
> The patch set is against a recent drm-tip.
>
> Thomas Zimmermann (15):
>drm: Add |struct drm_gem_ttm_object| and helpers
>drm: Add |struct drm_gem_ttm_object| callbacks for |struct
>  ttm_bo_driver|
>drm: Add |struct drm_gem_ttm_object| callbacks for |struct drm_driver|
>drm: Add drm_gem_ttm_fill_create_dumb() to create dumb buffers
>drm: Add Simple TTM, a memory manager for dedicated VRAM
>drm/ast: Convert AST driver to |struct drm_gem_ttm_object|
>drm/ast: Convert AST driver to Simple TTM
>drm/bochs: Convert Bochs driver to |struct drm_gem_ttm_object|
>drm/bochs: Convert Bochs driver to Simple TTM
>drm/mgag200: Convert mgag200 driver to |struct drm_gem_ttm_object|
>drm/mgag200: Convert mgag200 driver to Simple TTM
>drm/vboxvideo: Convert vboxvideo driver to |struct drm_gem_ttm_object|
>drm/vboxvideo: Convert vboxvideo driver to Simple TTM
>drm/hisilicon: Convert hibmc-drm driver to |struct drm_gem_ttm_object|
>drm/hisilicon: Convert hibmc-drm driver to Simple TTM
>
>   Documentation/gpu/drm-mm.rst  |  23 +
>   drivers/gpu/drm/Kconfig   |  20 +
>   drivers/gpu/drm/Makefile  |   5 +
>   drivers/gpu/drm/ast/Kconfig   |   3 +-
>   drivers/gpu/drm/ast/ast_drv.c |   4 +-
>   drivers/gpu/drm/ast/ast_drv.h |  58 +-
>   drivers/gpu/drm/ast/ast_fb.c  |  18 +-
>   drivers/gpu/drm/ast/ast_main.c|  74 +--
>   drivers/gpu/drm/ast/ast_mode.c|  78 +--
>   drivers/gpu/drm/ast/ast_ttm.c | 290 +-
>   drivers/gpu/drm/bochs/Kconfig |   2 +
>   drivers/gpu/drm/bochs/bochs.h |  42 +-
>   drivers/gpu/drm/bochs/bochs_drv.c |   4 +-
>   drivers/gpu/drm/bochs/bochs_kms.c |  18 +-
>   drivers/gpu/drm/bochs/bochs_mm.c  | 392 +-
>   drivers/gpu/drm/drm_gem_ttm_helper.c  | 507 ++
>   drivers/gpu/drm/drm_simple_ttm_helper.c   | 191 +++
>   drivers/gpu/drm/drm_ttm_helper_common.c   |   6 +
>   drivers/gpu/drm/hisilicon/hibmc/Kconfig   |   2 +
>   .../gpu/drm/hisilicon/hibmc/hibmc_drm_de.c|  19 +-
>   .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   |   4 +-
>