Re: [PATCH v5 00/13] Add video damage tracking

2023-08-30 Thread Alexander Graf



On 29.08.23 11:19, Mark Kettenis wrote:

Date: Tue, 29 Aug 2023 08:20:49 +0200
From: Alexander Graf 

On 28.08.23 23:54, Heinrich Schuchardt wrote:

On 8/28/23 22:24, Alexander Graf wrote:

On 28.08.23 19:54, Simon Glass wrote:

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:

On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf 
wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf 
wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf 
wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
 wrote:

This is a rebase of Alexander Graf's video damage tracking
series, with
some tests and other changes. The original cover letter is
as follows:


This patch set speeds up graphics output on ARM by a
factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map
it as cached,
but need it accessible by the display controller which
reads directly
from a later point of consistency. Hence, we flush the
frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are
seeing frame buffers
that can take a while to flush out. This was reported by
Da Xue with grub,
which happily print 1000s of spaces on the screen to draw
a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It
literally prints
over space characters for every character in its menu that it
wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle
that gets
modified. But without this patch set, we're flushing the full
DRAM
buffer on every u-boot text console character write, which
means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text
console as
well with this patch set, because even "normal" text prints
that write
for example a single line of text on the screen today flush
the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes
the cache
after every character. It doesn't do that for normal character
output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the
"print
character" level, so it calls UEFI's sopisticated
"output_string"
callback with single characters at a time, which means we do a
full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165




This patch set implements the easiest mitigation against
this problem:
Damage tracking. We remember the lowest common denominator
region that was
touched since the last video_sync() call and only flush
that. The most
typical writer to the frame buffer is the video console,
which always
writes rectangles of characters on the screen and syncs
afterwards.

With this patch set applied, we reduce drawing a large
grub menu (with
serial console attached for size information) on an
RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism,
reducing its
overhead compared to before as well. So even x86 systems
should be faster
with this now :).


Alternatives considered:

     1) Lazy sync - Sandbox does this. It only calls
video_sync(true) ever
    so often. We are missing timers to do this
generically.

     2) Double buffering - We could try to identify
whether anything changed
    at all and only draw to the FB if it did. That
would require
    maintaining a second buffer that we need to
scan.

     3) Text buffer - Maintain a buffer of all text
printed on the screen with
    respective location. Don't write if the old and
new character are
    identical. This would limit applicability to
text only and is an
    optimization on top of this patch set.

     4) Hash screen lines - Create a hash (sha256?)
over every line when it
    changes. Only flush when it does. I'm not sure
if this would waste
    more time, memory and cache than the current
approach. It would make
    full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check
into a function"
- Add patch "video: test: Support checking copy frame
buffer contents"
- Add patch "video: test: Test partial updates of hardware
frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and f

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-30 Thread Alexander Graf



On 30.08.23 20:27, Alper Nebi Yasak wrote:

Hi all,

On 2023-08-29 09:27 +03:00, Alexander Graf wrote:

On 29.08.23 00:08, Simon Glass wrote:

On Mon, 28 Aug 2023 at 14:24, Alexander Graf  wrote:

On 28.08.23 19:54, Simon Glass wrote:

U-Boot does have video_sync() but it doesn't know when to call it. If
it does not call it, then any amount of single-threaded code can run
after that, which may update the framebuffer. In other words, U-Boot
is in exactly the same boat as UEFI. It has to decide whether to call
video_sync() based on some sort of heuristic.

That is the only point I am trying to make here. Does that make sense?

Oh, I thought you mentioned above that U-Boot is in a better spot or
"has it solved already". I agree - it's in the same boat and the only
safe thing it can really do today that is fully cross-platform
compatible is to call video_sync() after every character.

I don't understand what you mean by "any amount of single-threaded code
can run after that, which may update the framebuffer". Any framebuffer
modification is U-Boot internal code which then again can apply
video_sync() to tell the system "I want what I wrote to screen actually
be on screen now". I don't think that's necessarily bad design. A bit
clunky, but we're in a pre-boot environment after all.

Since we're aligned now: What exactly did you refer to with "but we have
managed to work out something"?

So now we are on the same page about that point. The next step is my

heuristic point:

vidconsole_putc_xy() - does not call video_sync()
vidconsole_newline() - does

I am simply suggesting that UEFI should do the same thing.


I think the better analogy is with

vidconsole_puts() - does

and that's exactly the function that the UEFI code calls. The UEFI
interface is "write this long string to screen". All the UEFI code does
is call vidconsole_puts() to do that which then (rightfully) calls
video_sync().

The reason we flush after every character with grub is grub: Grub abuses
the "write long string to screen" function and instead only writes a
single character on each call, which then means we flush on every
character write.

There's another reason I discovered empirically as I tried to implement
cyclic video_sync()s instead of calling it whenever. The writes will go
through eventually (as the cache is filled by other work?) even if *we*
don't explicitly flush it. That means partial data gets pushed to the
display in an uncontrolled way, resulting in bad graphical artifacts.

And I mean very noticeable things like a blocky waterfall effect when
filling a blue rectangle background in GRUB menu, or half-rendered
letters in U-Boot console (very obvious if I get it to panic-and-hang).

I think that locks it down, and there's two reasonable things we can do:

- Write and immediately flush to fb (hardware) every time
- Batch writes in fb, periodically write-and-flush to copy_fb (hardware)

Both can utilize a damage tracking feature to minimize the amount of
copy/flush done. And maybe we can implement the two simultaneously if we
skip flushing when damaged region is zero-sized already-flushed.

There's a flaw with the second approach though, EFI can have direct
access to the hardware copy_fb. When it has directly written to the
framebuffer, we would need to at least stop overwriting that, and
ideally copy backwards to the non-hardware fb. Is there some kind of a
locking API that EFI apps use to get/release the framebuffer? We could
hook that into something like that.



Edk2 also has a shadow frame buffer that it uses for VGA because 
otherwise the NC read accesses from VRAM would be too expensive. I don't 
believe there's any UEFI locking mechanism, but clear understanding that 
if you want to access the frame buffer while anything else but you could 
potentially access it too, you better use the UEFI APIs instead of 
scribbling on it yourself.




Note that I've been using "flush" and not "sync" to avoid talking about
when a driver ops->video_sync() should be called. Is that fundamentally
incompatible with EFI, can we even call that after e.g. ExitBootServices
where the OS wants to use it as efifb? Maybe we should always call that
periodically at 60Hz (1us) or so?



If you actually need to do something actively frequently, then that 
won't work anymore after ExitBootServices. At that point, all "normal" 
U-Boot code is gone.




(I'm testing on rk3399-gru-kevin with a 2400x1600 eDP screen by the way,
and attaching some WIP patches if you want to test. Debian arm64 netinst
iso uses text-mode GRUB by default, in case you need a payload to test.)


Also I notice that EFI calls notify? all the time, so U-Boot probably
does have the ability to sync the video every 10ms if we wanted to.

BTW, with attached cyclic patch on chromebook_kevin, I immediately get a
warning that it takes too long, at 2-8ms without/with VIDEO_COPY. It's
about 11ms for both on sandbox on my PC.



I would expect explicit damage flushes like the patch set o

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-30 Thread Alper Nebi Yasak
Hi all,

On 2023-08-29 09:27 +03:00, Alexander Graf wrote:
> On 29.08.23 00:08, Simon Glass wrote:
>> On Mon, 28 Aug 2023 at 14:24, Alexander Graf  wrote:
>>> On 28.08.23 19:54, Simon Glass wrote:
 U-Boot does have video_sync() but it doesn't know when to call it. If
 it does not call it, then any amount of single-threaded code can run
 after that, which may update the framebuffer. In other words, U-Boot
 is in exactly the same boat as UEFI. It has to decide whether to call
 video_sync() based on some sort of heuristic.

 That is the only point I am trying to make here. Does that make sense?
>>>
>>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>>> "has it solved already". I agree - it's in the same boat and the only
>>> safe thing it can really do today that is fully cross-platform
>>> compatible is to call video_sync() after every character.
>>>
>>> I don't understand what you mean by "any amount of single-threaded code
>>> can run after that, which may update the framebuffer". Any framebuffer
>>> modification is U-Boot internal code which then again can apply
>>> video_sync() to tell the system "I want what I wrote to screen actually
>>> be on screen now". I don't think that's necessarily bad design. A bit
>>> clunky, but we're in a pre-boot environment after all.
>>>
>>> Since we're aligned now: What exactly did you refer to with "but we have
>>> managed to work out something"?
 So now we are on the same page about that point. The next step is my
>> heuristic point:
>>
>> vidconsole_putc_xy() - does not call video_sync()
>> vidconsole_newline() - does
>>
>> I am simply suggesting that UEFI should do the same thing.
> 
> 
> I think the better analogy is with
> 
> vidconsole_puts() - does
> 
> and that's exactly the function that the UEFI code calls. The UEFI 
> interface is "write this long string to screen". All the UEFI code does 
> is call vidconsole_puts() to do that which then (rightfully) calls 
> video_sync().
> 
> The reason we flush after every character with grub is grub: Grub abuses 
> the "write long string to screen" function and instead only writes a 
> single character on each call, which then means we flush on every 
> character write.

There's another reason I discovered empirically as I tried to implement
cyclic video_sync()s instead of calling it whenever. The writes will go
through eventually (as the cache is filled by other work?) even if *we*
don't explicitly flush it. That means partial data gets pushed to the
display in an uncontrolled way, resulting in bad graphical artifacts.

And I mean very noticeable things like a blocky waterfall effect when
filling a blue rectangle background in GRUB menu, or half-rendered
letters in U-Boot console (very obvious if I get it to panic-and-hang).

I think that locks it down, and there's two reasonable things we can do:

- Write and immediately flush to fb (hardware) every time
- Batch writes in fb, periodically write-and-flush to copy_fb (hardware)

Both can utilize a damage tracking feature to minimize the amount of
copy/flush done. And maybe we can implement the two simultaneously if we
skip flushing when damaged region is zero-sized already-flushed.

There's a flaw with the second approach though, EFI can have direct
access to the hardware copy_fb. When it has directly written to the
framebuffer, we would need to at least stop overwriting that, and
ideally copy backwards to the non-hardware fb. Is there some kind of a
locking API that EFI apps use to get/release the framebuffer? We could
hook that into something like that.

Note that I've been using "flush" and not "sync" to avoid talking about
when a driver ops->video_sync() should be called. Is that fundamentally
incompatible with EFI, can we even call that after e.g. ExitBootServices
where the OS wants to use it as efifb? Maybe we should always call that
periodically at 60Hz (1us) or so?

(I'm testing on rk3399-gru-kevin with a 2400x1600 eDP screen by the way,
and attaching some WIP patches if you want to test. Debian arm64 netinst
iso uses text-mode GRUB by default, in case you need a payload to test.)

>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>> does have the ability to sync the video every 10ms if we wanted to.

BTW, with attached cyclic patch on chromebook_kevin, I immediately get a
warning that it takes too long, at 2-8ms without/with VIDEO_COPY. It's
about 11ms for both on sandbox on my PC.

> I fail to see how invalidating the frame buffer for the screen every
> 10ms is any better than doing flush+invalidate operations only on screen
> areas that changed? It's more fragile, more difficult to understand and
> also significantly more expensive given that most of the time very
> little on the screen actually changes.
 I am not suggesting it is better, at all. I am just trying to explain
 that the U-Boot EFI implementation should not be calling video_sync

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-29 Thread Mark Kettenis
> Date: Tue, 29 Aug 2023 08:20:49 +0200
> From: Alexander Graf 
> 
> On 28.08.23 23:54, Heinrich Schuchardt wrote:
> > On 8/28/23 22:24, Alexander Graf wrote:
> >>
> >> On 28.08.23 19:54, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:
>  Hey Simon,
> 
>  On 22.08.23 20:56, Simon Glass wrote:
> > Hi Alex,
> >
> > On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:
> >> On 22.08.23 01:03, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf  
> >>> wrote:
>  On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 14:20, Alexander Graf 
> > wrote:
> >> On 21.08.23 21:57, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf 
> >>> wrote:
>  On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
> >  wrote:
> >> This is a rebase of Alexander Graf's video damage tracking
> >> series, with
> >> some tests and other changes. The original cover letter is
> >> as follows:
> >>
> >>> This patch set speeds up graphics output on ARM by a
> >>> factor of 60x.
> >>>
> >>> On most ARM SBCs, we keep the frame buffer in DRAM and map
> >>> it as cached,
> >>> but need it accessible by the display controller which
> >>> reads directly
> >>> from a later point of consistency. Hence, we flush the
> >>> frame buffer to
> >>> DRAM on every change. The full frame buffer.
> > It should not, see below.
> >
> >>> Unfortunately, with the advent of 4k displays, we are
> >>> seeing frame buffers
> >>> that can take a while to flush out. This was reported by
> >>> Da Xue with grub,
> >>> which happily print 1000s of spaces on the screen to draw
> >>> a menu. Every
> >>> printed space triggers a cache flush.
> > That is a bug somewhere in EFI.
>  Unfortunately not :). You may call it a bug in grub: It
>  literally prints
>  over space characters for every character in its menu that it
>  wants
>  cleared. On every text screen draw.
> 
>  This wouldn't be a big issue if we only flush the reactangle
>  that gets
>  modified. But without this patch set, we're flushing the full
>  DRAM
>  buffer on every u-boot text console character write, which
>  means for
>  every character (as that's the only API UEFI has).
> 
>  As a nice side effect, we speed up the normal U-Boot text
>  console as
>  well with this patch set, because even "normal" text prints
>  that write
>  for example a single line of text on the screen today flush
>  the full
>  frame buffer to DRAM.
> >>> No, I mean that it is a bug that U-Boot (apparently) flushes
> >>> the cache
> >>> after every character. It doesn't do that for normal character
> >>> output
> >>> and I don't think it makes sense to do it for EFI either.
> >> I see. Let's trace the calls:
> >>
> >> efi_cout_output_string()
> >> -> fputs()
> >> -> vidconsole_puts()
> >> -> video_sync()
> >> -> flush_dcache_range()
> >>
> >> Unfortunately grub abstracts character backends down to the 
> >> "print
> >> character" level, so it calls UEFI's sopisticated 
> >> "output_string"
> >> callback with single characters at a time, which means we do a
> >> full
> >> dcache flush for every character that we print:
> >>
> >> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>  
> >>
> >>
> >>
> >>> This patch set implements the easiest mitigation against
> >>> this problem:
> >>> Damage tracking. We remember the lowest common denominator
> >>> region that was
> >>> touched since the last video_sync() call and only flush
> >>> that. The most
> >>> typical writer to the frame buffer is the video console,
> >>> which always
> >>> writes rectangles of characters on the screen and syncs
> >>> afterwards.
> >>>
> >>> With this patch set applied, we reduce drawing a large
> >>> grub menu (with
> >>> serial

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Alexander Graf

Hi Simon,

On 29.08.23 00:08, Simon Glass wrote:

Hi Alex,

On Mon, 28 Aug 2023 at 14:24, Alexander Graf  wrote:


On 28.08.23 19:54, Simon Glass wrote:

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:

On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

 1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
so often. We are missing timers to do this generically.

 2) Double buffering - We could try to identify whether anything changed
at all and only draw to the FB if it did. That would require
maintaining a second buffer that we need to scan.

 3) Text buffer - Maintain a buffer of all text printed on the screen 
with
respective location. Don't write if the old and new character are
identical. This would limit applicability to text only and is an
optimization on top of this patch set.

 4) Hash screen lines - Create a hash (sha256?) over every line when it
changes. Only flush when it does. I'm not sure if this would waste
more time, memory and cache than the current approach. It would make
full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix un

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Alexander Graf



On 28.08.23 23:54, Heinrich Schuchardt wrote:

On 8/28/23 22:24, Alexander Graf wrote:


On 28.08.23 19:54, Simon Glass wrote:

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:

On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  
wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf 
wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf 
wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
 wrote:

This is a rebase of Alexander Graf's video damage tracking
series, with
some tests and other changes. The original cover letter is
as follows:


This patch set speeds up graphics output on ARM by a
factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map
it as cached,
but need it accessible by the display controller which
reads directly
from a later point of consistency. Hence, we flush the
frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are
seeing frame buffers
that can take a while to flush out. This was reported by
Da Xue with grub,
which happily print 1000s of spaces on the screen to draw
a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It
literally prints
over space characters for every character in its menu that it
wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle
that gets
modified. But without this patch set, we're flushing the full
DRAM
buffer on every u-boot text console character write, which
means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text
console as
well with this patch set, because even "normal" text prints
that write
for example a single line of text on the screen today flush
the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes
the cache
after every character. It doesn't do that for normal character
output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the 
"print
character" level, so it calls UEFI's sopisticated 
"output_string"

callback with single characters at a time, which means we do a
full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 





This patch set implements the easiest mitigation against
this problem:
Damage tracking. We remember the lowest common denominator
region that was
touched since the last video_sync() call and only flush
that. The most
typical writer to the frame buffer is the video console,
which always
writes rectangles of characters on the screen and syncs
afterwards.

With this patch set applied, we reduce drawing a large
grub menu (with
serial console attached for size information) on an
RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism,
reducing its
overhead compared to before as well. So even x86 systems
should be faster
with this now :).


Alternatives considered:

    1) Lazy sync - Sandbox does this. It only calls
video_sync(true) ever
   so often. We are missing timers to do this
generically.

    2) Double buffering - We could try to identify
whether anything changed
   at all and only draw to the FB if it did. That
would require
   maintaining a second buffer that we need to 
scan.


    3) Text buffer - Maintain a buffer of all text
printed on the screen with
   respective location. Don't write if the old and
new character are
   identical. This would limit applicability to
text only and is an
   optimization on top of this patch set.

    4) Hash screen lines - Create a hash (sha256?)
over every line when it
   changes. Only flush when it does. I'm not sure
if this would waste
   more time, memory and cache than the current
approach. It would make
   full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check
into a function"
- Add patch "video: test: Support checking copy frame
buffer contents"
- Add patch "video: test: Test partial updates of hardware
frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv
comment
- Return void from video_damage()
- Fix undeclared priv error in video_

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Simon Glass
Hi Alex,

On Mon, 28 Aug 2023 at 14:24, Alexander Graf  wrote:
>
>
> On 28.08.23 19:54, Simon Glass wrote:
> > Hi Alex,
> >
> > On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:
> >> Hey Simon,
> >>
> >> On 22.08.23 20:56, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:
>  On 22.08.23 01:03, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:
> >> On 22.08.23 00:10, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:
>  On 21.08.23 21:57, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:33, Alexander Graf  
> > wrote:
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak 
> >>>  wrote:
>  This is a rebase of Alexander Graf's video damage tracking 
>  series, with
>  some tests and other changes. The original cover letter is as 
>  follows:
> 
> > This patch set speeds up graphics output on ARM by a factor of 
> > 60x.
> >
> > On most ARM SBCs, we keep the frame buffer in DRAM and map it 
> > as cached,
> > but need it accessible by the display controller which reads 
> > directly
> > from a later point of consistency. Hence, we flush the frame 
> > buffer to
> > DRAM on every change. The full frame buffer.
> >>> It should not, see below.
> >>>
> > Unfortunately, with the advent of 4k displays, we are seeing 
> > frame buffers
> > that can take a while to flush out. This was reported by Da Xue 
> > with grub,
> > which happily print 1000s of spaces on the screen to draw a 
> > menu. Every
> > printed space triggers a cache flush.
> >>> That is a bug somewhere in EFI.
> >> Unfortunately not :). You may call it a bug in grub: It literally 
> >> prints
> >> over space characters for every character in its menu that it wants
> >> cleared. On every text screen draw.
> >>
> >> This wouldn't be a big issue if we only flush the reactangle that 
> >> gets
> >> modified. But without this patch set, we're flushing the full DRAM
> >> buffer on every u-boot text console character write, which means 
> >> for
> >> every character (as that's the only API UEFI has).
> >>
> >> As a nice side effect, we speed up the normal U-Boot text console 
> >> as
> >> well with this patch set, because even "normal" text prints that 
> >> write
> >> for example a single line of text on the screen today flush the 
> >> full
> >> frame buffer to DRAM.
> > No, I mean that it is a bug that U-Boot (apparently) flushes the 
> > cache
> > after every character. It doesn't do that for normal character 
> > output
> > and I don't think it makes sense to do it for EFI either.
>  I see. Let's trace the calls:
> 
>  efi_cout_output_string()
>  -> fputs()
>  -> vidconsole_puts()
>  -> video_sync()
>  -> flush_dcache_range()
> 
>  Unfortunately grub abstracts character backends down to the "print
>  character" level, so it calls UEFI's sopisticated "output_string"
>  callback with single characters at a time, which means we do a full
>  dcache flush for every character that we print:
> 
>  https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> 
> 
> > This patch set implements the easiest mitigation against this 
> > problem:
> > Damage tracking. We remember the lowest common denominator 
> > region that was
> > touched since the last video_sync() call and only flush that. 
> > The most
> > typical writer to the frame buffer is the video console, which 
> > always
> > writes rectangles of characters on the screen and syncs 
> > afterwards.
> >
> > With this patch set applied, we reduce drawing a large grub 
> > menu (with
> > serial console attached for size information) on an RK3399-ROC 
> > system
> > at 1440p from 55 seconds to less than 1 second.
> >
> > Version 2 also implements VIDEO_COPY using this mechanism, 
> > reducing its
> > overhead compared to before as well. So even x86 systems should 
> > be faster
> > with this now :).
> 

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Heinrich Schuchardt

On 8/28/23 22:24, Alexander Graf wrote:


On 28.08.23 19:54, Simon Glass wrote:

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:

On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf 
wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf 
wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
 wrote:

This is a rebase of Alexander Graf's video damage tracking
series, with
some tests and other changes. The original cover letter is
as follows:


This patch set speeds up graphics output on ARM by a
factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map
it as cached,
but need it accessible by the display controller which
reads directly
from a later point of consistency. Hence, we flush the
frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are
seeing frame buffers
that can take a while to flush out. This was reported by
Da Xue with grub,
which happily print 1000s of spaces on the screen to draw
a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It
literally prints
over space characters for every character in its menu that it
wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle
that gets
modified. But without this patch set, we're flushing the full
DRAM
buffer on every u-boot text console character write, which
means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text
console as
well with this patch set, because even "normal" text prints
that write
for example a single line of text on the screen today flush
the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes
the cache
after every character. It doesn't do that for normal character
output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a
full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against
this problem:
Damage tracking. We remember the lowest common denominator
region that was
touched since the last video_sync() call and only flush
that. The most
typical writer to the frame buffer is the video console,
which always
writes rectangles of characters on the screen and syncs
afterwards.

With this patch set applied, we reduce drawing a large
grub menu (with
serial console attached for size information) on an
RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism,
reducing its
overhead compared to before as well. So even x86 systems
should be faster
with this now :).


Alternatives considered:

    1) Lazy sync - Sandbox does this. It only calls
video_sync(true) ever
   so often. We are missing timers to do this
generically.

    2) Double buffering - We could try to identify
whether anything changed
   at all and only draw to the FB if it did. That
would require
   maintaining a second buffer that we need to scan.

    3) Text buffer - Maintain a buffer of all text
printed on the screen with
   respective location. Don't write if the old and
new character are
   identical. This would limit applicability to
text only and is an
   optimization on top of this patch set.

    4) Hash screen lines - Create a hash (sha256?)
over every line when it
   changes. Only flush when it does. I'm not sure
if this would waste
   more time, memory and cache than the current
approach. It would make
   full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check
into a function"
- Add patch "video: test: Support checking copy frame
buffer contents"
- Add patch "video: test: Test partial updates of hardware
frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv
comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Alexander Graf



On 28.08.23 19:54, Simon Glass wrote:

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:

On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
   so often. We are missing timers to do this generically.

2) Double buffering - We could try to identify whether anything changed
   at all and only draw to the FB if it did. That would require
   maintaining a second buffer that we need to scan.

3) Text buffer - Maintain a buffer of all text printed on the screen 
with
   respective location. Don't write if the old and new character are
   identical. This would limit applicability to text only and is an
   optimization on top of this patch set.

4) Hash screen lines - Create a hash (sha256?) over every line when it
   changes. Only flush when it does. I'm not sure if this would waste
   more time, memory and cache than the current approach. It would make
   full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED(

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-28 Thread Simon Glass
Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf  wrote:
>
> Hey Simon,
>
> On 22.08.23 20:56, Simon Glass wrote:
> > Hi Alex,
> >
> > On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:
> >>
> >> On 22.08.23 01:03, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:
>  On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:
> >> On 21.08.23 21:57, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:
>  On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak 
> >  wrote:
> >> This is a rebase of Alexander Graf's video damage tracking series, 
> >> with
> >> some tests and other changes. The original cover letter is as 
> >> follows:
> >>
> >>> This patch set speeds up graphics output on ARM by a factor of 
> >>> 60x.
> >>>
> >>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as 
> >>> cached,
> >>> but need it accessible by the display controller which reads 
> >>> directly
> >>> from a later point of consistency. Hence, we flush the frame 
> >>> buffer to
> >>> DRAM on every change. The full frame buffer.
> > It should not, see below.
> >
> >>> Unfortunately, with the advent of 4k displays, we are seeing 
> >>> frame buffers
> >>> that can take a while to flush out. This was reported by Da Xue 
> >>> with grub,
> >>> which happily print 1000s of spaces on the screen to draw a menu. 
> >>> Every
> >>> printed space triggers a cache flush.
> > That is a bug somewhere in EFI.
>  Unfortunately not :). You may call it a bug in grub: It literally 
>  prints
>  over space characters for every character in its menu that it wants
>  cleared. On every text screen draw.
> 
>  This wouldn't be a big issue if we only flush the reactangle that 
>  gets
>  modified. But without this patch set, we're flushing the full DRAM
>  buffer on every u-boot text console character write, which means for
>  every character (as that's the only API UEFI has).
> 
>  As a nice side effect, we speed up the normal U-Boot text console as
>  well with this patch set, because even "normal" text prints that 
>  write
>  for example a single line of text on the screen today flush the full
>  frame buffer to DRAM.
> >>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>> after every character. It doesn't do that for normal character output
> >>> and I don't think it makes sense to do it for EFI either.
> >> I see. Let's trace the calls:
> >>
> >> efi_cout_output_string()
> >> -> fputs()
> >> -> vidconsole_puts()
> >> -> video_sync()
> >> -> flush_dcache_range()
> >>
> >> Unfortunately grub abstracts character backends down to the "print
> >> character" level, so it calls UEFI's sopisticated "output_string"
> >> callback with single characters at a time, which means we do a full
> >> dcache flush for every character that we print:
> >>
> >> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>
> >>
> >>> This patch set implements the easiest mitigation against this 
> >>> problem:
> >>> Damage tracking. We remember the lowest common denominator region 
> >>> that was
> >>> touched since the last video_sync() call and only flush that. The 
> >>> most
> >>> typical writer to the frame buffer is the video console, which 
> >>> always
> >>> writes rectangles of characters on the screen and syncs 
> >>> afterwards.
> >>>
> >>> With this patch set applied, we reduce drawing a large grub menu 
> >>> (with
> >>> serial console attached for size information) on an RK3399-ROC 
> >>> system
> >>> at 1440p from 55 seconds to less than 1 second.
> >>>
> >>> Version 2 also implements VIDEO_COPY using this mechanism, 
> >>> reducing its
> >>> overhead compared to before as well. So even x86 systems should 
> >>> be faster
> >>> with this now :).
> >>>
> >>>
> >>> Alternatives considered:
> >>>
> >>>1) Lazy sync - Sandbox does this. It only calls 
> >>> video_sync(true) ever
> >>>   so often. We are missing timers to do this generically.
> >>>
> >>>2) Double buffering - We could try to identify whether 
> >>> anything changed
> >>> 

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-23 Thread Alexander Graf

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:


On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:

On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

   1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
  so often. We are missing timers to do this generically.

   2) Double buffering - We could try to identify whether anything changed
  at all and only draw to the FB if it did. That would require
  maintaining a second buffer that we need to scan.

   3) Text buffer - Maintain a buffer of all text printed on the screen with
  respective location. Don't write if the old and new character are
  identical. This would limit applicability to text only and is an
  optimization on top of this patch set.

   4) Hash screen lines - Create a hash (sha256?) over every line when it
  changes. Only flush when it does. I'm not sure if this would waste
  more time, memory and cache than the current approach. It would make
  full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-22 Thread Simon Glass
Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf  wrote:
>
>
> On 22.08.23 01:03, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:
> >>
> >> On 22.08.23 00:10, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:
>  On 21.08.23 21:57, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak 
> >>>  wrote:
>  This is a rebase of Alexander Graf's video damage tracking series, 
>  with
>  some tests and other changes. The original cover letter is as 
>  follows:
> 
> > This patch set speeds up graphics output on ARM by a factor of 60x.
> >
> > On most ARM SBCs, we keep the frame buffer in DRAM and map it as 
> > cached,
> > but need it accessible by the display controller which reads 
> > directly
> > from a later point of consistency. Hence, we flush the frame buffer 
> > to
> > DRAM on every change. The full frame buffer.
> >>> It should not, see below.
> >>>
> > Unfortunately, with the advent of 4k displays, we are seeing frame 
> > buffers
> > that can take a while to flush out. This was reported by Da Xue 
> > with grub,
> > which happily print 1000s of spaces on the screen to draw a menu. 
> > Every
> > printed space triggers a cache flush.
> >>> That is a bug somewhere in EFI.
> >> Unfortunately not :). You may call it a bug in grub: It literally 
> >> prints
> >> over space characters for every character in its menu that it wants
> >> cleared. On every text screen draw.
> >>
> >> This wouldn't be a big issue if we only flush the reactangle that gets
> >> modified. But without this patch set, we're flushing the full DRAM
> >> buffer on every u-boot text console character write, which means for
> >> every character (as that's the only API UEFI has).
> >>
> >> As a nice side effect, we speed up the normal U-Boot text console as
> >> well with this patch set, because even "normal" text prints that write
> >> for example a single line of text on the screen today flush the full
> >> frame buffer to DRAM.
> > No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> > after every character. It doesn't do that for normal character output
> > and I don't think it makes sense to do it for EFI either.
>  I see. Let's trace the calls:
> 
>  efi_cout_output_string()
>  -> fputs()
>  -> vidconsole_puts()
>  -> video_sync()
>  -> flush_dcache_range()
> 
>  Unfortunately grub abstracts character backends down to the "print
>  character" level, so it calls UEFI's sopisticated "output_string"
>  callback with single characters at a time, which means we do a full
>  dcache flush for every character that we print:
> 
>  https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> 
> 
> > This patch set implements the easiest mitigation against this 
> > problem:
> > Damage tracking. We remember the lowest common denominator region 
> > that was
> > touched since the last video_sync() call and only flush that. The 
> > most
> > typical writer to the frame buffer is the video console, which 
> > always
> > writes rectangles of characters on the screen and syncs afterwards.
> >
> > With this patch set applied, we reduce drawing a large grub menu 
> > (with
> > serial console attached for size information) on an RK3399-ROC 
> > system
> > at 1440p from 55 seconds to less than 1 second.
> >
> > Version 2 also implements VIDEO_COPY using this mechanism, reducing 
> > its
> > overhead compared to before as well. So even x86 systems should be 
> > faster
> > with this now :).
> >
> >
> > Alternatives considered:
> >
> >   1) Lazy sync - Sandbox does this. It only calls 
> > video_sync(true) ever
> >  so often. We are missing timers to do this generically.
> >
> >   2) Double buffering - We could try to identify whether 
> > anything changed
> >  at all and only draw to the FB if it did. That would 
> > require
> >  maintaining a second buffer that we need to scan.
> >
> >   3) Text buffer - Maintain a buffer of all text printed on the 
> > screen with
> >  respective location. Don't write if the old and new 
> > character are
> >  identical

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-22 Thread Alexander Graf



On 22.08.23 01:03, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:


On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:

On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

  1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
 so often. We are missing timers to do this generically.

  2) Double buffering - We could try to identify whether anything changed
 at all and only draw to the FB if it did. That would require
 maintaining a second buffer that we need to scan.

  3) Text buffer - Maintain a buffer of all text printed on the screen with
 respective location. Don't write if the old and new character are
 identical. This would limit applicability to text only and is an
 optimization on top of this patch set.

  4) Hash screen lines - Create a hash (sha256?) over every line when it
 changes. Only flush when it does. I'm not sure if this would waste
 more time, memory and cache than the current approach. It would make
 full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention 

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Simon Glass
Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf  wrote:
>
>
> On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:
> >>
> >> On 21.08.23 21:57, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:
>  On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak 
> >  wrote:
> >> This is a rebase of Alexander Graf's video damage tracking series, with
> >> some tests and other changes. The original cover letter is as follows:
> >>
> >>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>
> >>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as 
> >>> cached,
> >>> but need it accessible by the display controller which reads directly
> >>> from a later point of consistency. Hence, we flush the frame buffer to
> >>> DRAM on every change. The full frame buffer.
> > It should not, see below.
> >
> >>> Unfortunately, with the advent of 4k displays, we are seeing frame 
> >>> buffers
> >>> that can take a while to flush out. This was reported by Da Xue with 
> >>> grub,
> >>> which happily print 1000s of spaces on the screen to draw a menu. 
> >>> Every
> >>> printed space triggers a cache flush.
> > That is a bug somewhere in EFI.
>  Unfortunately not :). You may call it a bug in grub: It literally prints
>  over space characters for every character in its menu that it wants
>  cleared. On every text screen draw.
> 
>  This wouldn't be a big issue if we only flush the reactangle that gets
>  modified. But without this patch set, we're flushing the full DRAM
>  buffer on every u-boot text console character write, which means for
>  every character (as that's the only API UEFI has).
> 
>  As a nice side effect, we speed up the normal U-Boot text console as
>  well with this patch set, because even "normal" text prints that write
>  for example a single line of text on the screen today flush the full
>  frame buffer to DRAM.
> >>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>> after every character. It doesn't do that for normal character output
> >>> and I don't think it makes sense to do it for EFI either.
> >>
> >> I see. Let's trace the calls:
> >>
> >> efi_cout_output_string()
> >> -> fputs()
> >> -> vidconsole_puts()
> >> -> video_sync()
> >> -> flush_dcache_range()
> >>
> >> Unfortunately grub abstracts character backends down to the "print
> >> character" level, so it calls UEFI's sopisticated "output_string"
> >> callback with single characters at a time, which means we do a full
> >> dcache flush for every character that we print:
> >>
> >> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>
> >>
> >>> This patch set implements the easiest mitigation against this problem:
> >>> Damage tracking. We remember the lowest common denominator region 
> >>> that was
> >>> touched since the last video_sync() call and only flush that. The most
> >>> typical writer to the frame buffer is the video console, which always
> >>> writes rectangles of characters on the screen and syncs afterwards.
> >>>
> >>> With this patch set applied, we reduce drawing a large grub menu (with
> >>> serial console attached for size information) on an RK3399-ROC system
> >>> at 1440p from 55 seconds to less than 1 second.
> >>>
> >>> Version 2 also implements VIDEO_COPY using this mechanism, reducing 
> >>> its
> >>> overhead compared to before as well. So even x86 systems should be 
> >>> faster
> >>> with this now :).
> >>>
> >>>
> >>> Alternatives considered:
> >>>
> >>>  1) Lazy sync - Sandbox does this. It only calls video_sync(true) 
> >>> ever
> >>> so often. We are missing timers to do this generically.
> >>>
> >>>  2) Double buffering - We could try to identify whether anything 
> >>> changed
> >>> at all and only draw to the FB if it did. That would require
> >>> maintaining a second buffer that we need to scan.
> >>>
> >>>  3) Text buffer - Maintain a buffer of all text printed on the 
> >>> screen with
> >>> respective location. Don't write if the old and new character 
> >>> are
> >>> identical. This would limit applicability to text only and is 
> >>> an
> >>> optimization on top of this patch set.
> >>>
> >>>  4) Hash screen lines - Create a hash (sha256?) over every line 
> >>> when it
> >>> changes. Only flush when it does. I'm not sure if this would 
> >>> waste
> >>> more time, memory and cache than the current approach. It 
> >>> would make
> >>> full screen

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Alexander Graf



On 22.08.23 00:10, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:


On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:

On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.

Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.


I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print
character" level, so it calls UEFI's sopisticated "output_string"
callback with single characters at a time, which means we do a full
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165



This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

 1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
so often. We are missing timers to do this generically.

 2) Double buffering - We could try to identify whether anything changed
at all and only draw to the FB if it did. That would require
maintaining a second buffer that we need to scan.

 3) Text buffer - Maintain a buffer of all text printed on the screen with
respective location. Don't write if the old and new character are
identical. This would limit applicability to text only and is an
optimization on top of this patch set.

 4) Hash screen lines - Create a hash (sha256?) over every line when it
changes. Only flush when it does. I'm not sure if this would waste
more time, memory and cache than the current approach. It would make
full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention about not having minimal damage for console_rotate.c
- Add patch "video: test: Test video damage tracking via vidco

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Simon Glass
Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf  wrote:
>
>
> On 21.08.23 21:57, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:
> >>
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  
> >>> wrote:
>  This is a rebase of Alexander Graf's video damage tracking series, with
>  some tests and other changes. The original cover letter is as follows:
> 
> > This patch set speeds up graphics output on ARM by a factor of 60x.
> >
> > On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> > but need it accessible by the display controller which reads directly
> > from a later point of consistency. Hence, we flush the frame buffer to
> > DRAM on every change. The full frame buffer.
> >>> It should not, see below.
> >>>
> > Unfortunately, with the advent of 4k displays, we are seeing frame 
> > buffers
> > that can take a while to flush out. This was reported by Da Xue with 
> > grub,
> > which happily print 1000s of spaces on the screen to draw a menu. Every
> > printed space triggers a cache flush.
> >>> That is a bug somewhere in EFI.
> >>
> >> Unfortunately not :). You may call it a bug in grub: It literally prints
> >> over space characters for every character in its menu that it wants
> >> cleared. On every text screen draw.
> >>
> >> This wouldn't be a big issue if we only flush the reactangle that gets
> >> modified. But without this patch set, we're flushing the full DRAM
> >> buffer on every u-boot text console character write, which means for
> >> every character (as that's the only API UEFI has).
> >>
> >> As a nice side effect, we speed up the normal U-Boot text console as
> >> well with this patch set, because even "normal" text prints that write
> >> for example a single line of text on the screen today flush the full
> >> frame buffer to DRAM.
> > No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> > after every character. It doesn't do that for normal character output
> > and I don't think it makes sense to do it for EFI either.
>
>
> I see. Let's trace the calls:
>
> efi_cout_output_string()
> -> fputs()
> -> vidconsole_puts()
> -> video_sync()
> -> flush_dcache_range()
>
> Unfortunately grub abstracts character backends down to the "print
> character" level, so it calls UEFI's sopisticated "output_string"
> callback with single characters at a time, which means we do a full
> dcache flush for every character that we print:
>
> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>
>
> >
> >>
> > This patch set implements the easiest mitigation against this problem:
> > Damage tracking. We remember the lowest common denominator region that 
> > was
> > touched since the last video_sync() call and only flush that. The most
> > typical writer to the frame buffer is the video console, which always
> > writes rectangles of characters on the screen and syncs afterwards.
> >
> > With this patch set applied, we reduce drawing a large grub menu (with
> > serial console attached for size information) on an RK3399-ROC system
> > at 1440p from 55 seconds to less than 1 second.
> >
> > Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> > overhead compared to before as well. So even x86 systems should be 
> > faster
> > with this now :).
> >
> >
> > Alternatives considered:
> >
> > 1) Lazy sync - Sandbox does this. It only calls video_sync(true) 
> > ever
> >so often. We are missing timers to do this generically.
> >
> > 2) Double buffering - We could try to identify whether anything 
> > changed
> >at all and only draw to the FB if it did. That would require
> >maintaining a second buffer that we need to scan.
> >
> > 3) Text buffer - Maintain a buffer of all text printed on the 
> > screen with
> >respective location. Don't write if the old and new character are
> >identical. This would limit applicability to text only and is an
> >optimization on top of this patch set.
> >
> > 4) Hash screen lines - Create a hash (sha256?) over every line when 
> > it
> >changes. Only flush when it does. I'm not sure if this would 
> > waste
> >more time, memory and cache than the current approach. It would 
> > make
> >full screen updates much more expensive.
> >>> 5) Fix the bug mentioned above?
> >>>
>  Changes in v5:
>  - Add patch "video: test: Split copy frame buffer check into a function"
>  - Add patch "video: test: Support checking copy frame buffer contents"
>  - Add patch "video: test: Test partial updates of hardware frame buffer"
>  - Use xstart, ystart, xend, yend as names for

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Alexander Graf



On 21.08.23 21:57, Simon Glass wrote:

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:


On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.


Unfortunately not :). You may call it a bug in grub: It literally prints
over space characters for every character in its menu that it wants
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets
modified. But without this patch set, we're flushing the full DRAM
buffer on every u-boot text console character write, which means for
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as
well with this patch set, because even "normal" text prints that write
for example a single line of text on the screen today flush the full
frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.



I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print 
character" level, so it calls UEFI's sopisticated "output_string" 
callback with single characters at a time, which means we do a full 
dcache flush for every character that we print:


https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165







This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
   so often. We are missing timers to do this generically.

2) Double buffering - We could try to identify whether anything changed
   at all and only draw to the FB if it did. That would require
   maintaining a second buffer that we need to scan.

3) Text buffer - Maintain a buffer of all text printed on the screen with
   respective location. Don't write if the old and new character are
   identical. This would limit applicability to text only and is an
   optimization on top of this patch set.

4) Hash screen lines - Create a hash (sha256?) over every line when it
   changes. Only flush when it does. I'm not sure if this would waste
   more time, memory and cache than the current approach. It would make
   full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention about not having minimal damage for console_rotate.c
- Add patch "video: test: Test video damage tracking via vidconsole"
- Document new vdev field in struct efi_gop_obj comment
- Remove video_sync_copy() also from video_

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Simon Glass
Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf  wrote:
>
>
> On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  
> > wrote:
> >> This is a rebase of Alexander Graf's video damage tracking series, with
> >> some tests and other changes. The original cover letter is as follows:
> >>
> >>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>
> >>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>> but need it accessible by the display controller which reads directly
> >>> from a later point of consistency. Hence, we flush the frame buffer to
> >>> DRAM on every change. The full frame buffer.
> > It should not, see below.
> >
> >>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>> printed space triggers a cache flush.
> > That is a bug somewhere in EFI.
>
>
> Unfortunately not :). You may call it a bug in grub: It literally prints
> over space characters for every character in its menu that it wants
> cleared. On every text screen draw.
>
> This wouldn't be a big issue if we only flush the reactangle that gets
> modified. But without this patch set, we're flushing the full DRAM
> buffer on every u-boot text console character write, which means for
> every character (as that's the only API UEFI has).
>
> As a nice side effect, we speed up the normal U-Boot text console as
> well with this patch set, because even "normal" text prints that write
> for example a single line of text on the screen today flush the full
> frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

>
>
> >
> >>> This patch set implements the easiest mitigation against this problem:
> >>> Damage tracking. We remember the lowest common denominator region that was
> >>> touched since the last video_sync() call and only flush that. The most
> >>> typical writer to the frame buffer is the video console, which always
> >>> writes rectangles of characters on the screen and syncs afterwards.
> >>>
> >>> With this patch set applied, we reduce drawing a large grub menu (with
> >>> serial console attached for size information) on an RK3399-ROC system
> >>> at 1440p from 55 seconds to less than 1 second.
> >>>
> >>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>> overhead compared to before as well. So even x86 systems should be faster
> >>> with this now :).
> >>>
> >>>
> >>> Alternatives considered:
> >>>
> >>>1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>   so often. We are missing timers to do this generically.
> >>>
> >>>2) Double buffering - We could try to identify whether anything changed
> >>>   at all and only draw to the FB if it did. That would require
> >>>   maintaining a second buffer that we need to scan.
> >>>
> >>>3) Text buffer - Maintain a buffer of all text printed on the screen 
> >>> with
> >>>   respective location. Don't write if the old and new character are
> >>>   identical. This would limit applicability to text only and is an
> >>>   optimization on top of this patch set.
> >>>
> >>>4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>   changes. Only flush when it does. I'm not sure if this would waste
> >>>   more time, memory and cache than the current approach. It would make
> >>>   full screen updates much more expensive.
> > 5) Fix the bug mentioned above?
> >
> >> Changes in v5:
> >> - Add patch "video: test: Split copy frame buffer check into a function"
> >> - Add patch "video: test: Support checking copy frame buffer contents"
> >> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >> - Use xstart, ystart, xend, yend as names for damage region
> >> - Document damage struct and fields in struct video_priv comment
> >> - Return void from video_damage()
> >> - Fix undeclared priv error in video_sync()
> >> - Drop unused headers from video-uclass.c
> >> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >> - Call video_damage() also in video_fill_part()
> >> - Use met->baseline instead of priv->baseline
> >> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >> - Update console_rotate.c video_damage() calls to pass video tests
> >> - Remove mention about not having minimal damage for console_rotate.c
> >> - Add patch "video: test: Test video damage tracking via vidconsole"
> >> - Document new vdev field in struct efi_gop_obj comment
> >> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >> - Fix memmove() calls by removing the extra dev argument
> >> - Call video

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Alexander Graf



On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:


This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

It should not, see below.


Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

That is a bug somewhere in EFI.



Unfortunately not :). You may call it a bug in grub: It literally prints 
over space characters for every character in its menu that it wants 
cleared. On every text screen draw.


This wouldn't be a big issue if we only flush the reactangle that gets 
modified. But without this patch set, we're flushing the full DRAM 
buffer on every u-boot text console character write, which means for 
every character (as that's the only API UEFI has).


As a nice side effect, we speed up the normal U-Boot text console as 
well with this patch set, because even "normal" text prints that write 
for example a single line of text on the screen today flush the full 
frame buffer to DRAM.






This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that. The most
typical writer to the frame buffer is the video console, which always
writes rectangles of characters on the screen and syncs afterwards.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.

Version 2 also implements VIDEO_COPY using this mechanism, reducing its
overhead compared to before as well. So even x86 systems should be faster
with this now :).


Alternatives considered:

   1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
  so often. We are missing timers to do this generically.

   2) Double buffering - We could try to identify whether anything changed
  at all and only draw to the FB if it did. That would require
  maintaining a second buffer that we need to scan.

   3) Text buffer - Maintain a buffer of all text printed on the screen with
  respective location. Don't write if the old and new character are
  identical. This would limit applicability to text only and is an
  optimization on top of this patch set.

   4) Hash screen lines - Create a hash (sha256?) over every line when it
  changes. Only flush when it does. I'm not sure if this would waste
  more time, memory and cache than the current approach. It would make
  full screen updates much more expensive.

5) Fix the bug mentioned above?


Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention about not having minimal damage for console_rotate.c
- Add patch "video: test: Test video damage tracking via vidconsole"
- Document new vdev field in struct efi_gop_obj comment
- Remove video_sync_copy() also from video_fill(), video_fill_part()
- Fix memmove() calls by removing the extra dev argument
- Call video_sync() before checking copy_fb in video tests
- Imply VIDEO_DAMAGE for video drivers instead of selecting it
- Imply VIDEO_DAMAGE also for VIDEO_TIDSS

v4: https://lore.kernel.org/all/20230103215004.22646-1-ag...@csgraf.de/

Changes in v4:
- Move damage clear to patch "dm: video: Add damage tracking API"
- Simplify first damage logic
- Remove VIDEO_DAMAGE default for ARM
- Skip damage on EfiBltVideoToBltBuffer
- Add patch "video: Always compile cache flushing code"
- Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"

v3: https://lore.kernel.org/all/20221230195828.88134-1-ag...@csgraf.de/

Changes in v3:
- Adapt to always assume DM is used
- Adapt to always 

Re: [PATCH v5 00/13] Add video damage tracking

2023-08-21 Thread Simon Glass
Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak  wrote:
>
> This is a rebase of Alexander Graf's video damage tracking series, with
> some tests and other changes. The original cover letter is as follows:
>
> > This patch set speeds up graphics output on ARM by a factor of 60x.
> >
> > On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> > but need it accessible by the display controller which reads directly
> > from a later point of consistency. Hence, we flush the frame buffer to
> > DRAM on every change. The full frame buffer.

It should not, see below.

> >
> > Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> > that can take a while to flush out. This was reported by Da Xue with grub,
> > which happily print 1000s of spaces on the screen to draw a menu. Every
> > printed space triggers a cache flush.

That is a bug somewhere in EFI.

> >
> > This patch set implements the easiest mitigation against this problem:
> > Damage tracking. We remember the lowest common denominator region that was
> > touched since the last video_sync() call and only flush that. The most
> > typical writer to the frame buffer is the video console, which always
> > writes rectangles of characters on the screen and syncs afterwards.
> >
> > With this patch set applied, we reduce drawing a large grub menu (with
> > serial console attached for size information) on an RK3399-ROC system
> > at 1440p from 55 seconds to less than 1 second.
> >
> > Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> > overhead compared to before as well. So even x86 systems should be faster
> > with this now :).
> >
> >
> > Alternatives considered:
> >
> >   1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >  so often. We are missing timers to do this generically.
> >
> >   2) Double buffering - We could try to identify whether anything changed
> >  at all and only draw to the FB if it did. That would require
> >  maintaining a second buffer that we need to scan.
> >
> >   3) Text buffer - Maintain a buffer of all text printed on the screen with
> >  respective location. Don't write if the old and new character are
> >  identical. This would limit applicability to text only and is an
> >  optimization on top of this patch set.
> >
> >   4) Hash screen lines - Create a hash (sha256?) over every line when it
> >  changes. Only flush when it does. I'm not sure if this would waste
> >  more time, memory and cache than the current approach. It would make
> >  full screen updates much more expensive.

5) Fix the bug mentioned above?

>
> Changes in v5:
> - Add patch "video: test: Split copy frame buffer check into a function"
> - Add patch "video: test: Support checking copy frame buffer contents"
> - Add patch "video: test: Test partial updates of hardware frame buffer"
> - Use xstart, ystart, xend, yend as names for damage region
> - Document damage struct and fields in struct video_priv comment
> - Return void from video_damage()
> - Fix undeclared priv error in video_sync()
> - Drop unused headers from video-uclass.c
> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> - Call video_damage() also in video_fill_part()
> - Use met->baseline instead of priv->baseline
> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> - Update console_rotate.c video_damage() calls to pass video tests
> - Remove mention about not having minimal damage for console_rotate.c
> - Add patch "video: test: Test video damage tracking via vidconsole"
> - Document new vdev field in struct efi_gop_obj comment
> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> - Fix memmove() calls by removing the extra dev argument
> - Call video_sync() before checking copy_fb in video tests
> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>
> v4: https://lore.kernel.org/all/20230103215004.22646-1-ag...@csgraf.de/
>
> Changes in v4:
> - Move damage clear to patch "dm: video: Add damage tracking API"
> - Simplify first damage logic
> - Remove VIDEO_DAMAGE default for ARM
> - Skip damage on EfiBltVideoToBltBuffer
> - Add patch "video: Always compile cache flushing code"
> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>
> v3: https://lore.kernel.org/all/20221230195828.88134-1-ag...@csgraf.de/
>
> Changes in v3:
> - Adapt to always assume DM is used
> - Adapt to always assume DM is used
> - Make VIDEO_COPY always select VIDEO_DAMAGE
>
> v2: https://lore.kernel.org/all/20220609225921.62462-1-ag...@csgraf.de/
>
> Changes in v2:
> - Remove ifdefs
> - Fix ranges in truetype target
> - Limit rotate to necessary damage
> - Remove ifdefs from gop
> - Fix dcache range; we were flushing too much before
> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>
> v1: https://lore.kernel.org/all/20220606234336.5021-1-ag...@csgraf.de/
>
> Alexander Graf (9):
>   dm: