Re: RFC for a render API to support adaptive sync and VRR

Nicolai Hähnle Tue, 10 Apr 2018 23:58:03 -0700

On 10.04.2018 23:45, Cyr, Aric wrote:

For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.


That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason.  When a game presents a completed frame, they’d like
that to happen as soon as possible.


What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.


Yes, I agree completely.  However that's only truly relevant for fixed
refreshed rate displays.


No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.


Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you,

so why bother?

I think Michel's point is that the monitor and GPU hardware *cannot*
really do this, because there's synchronization with audio to take into
account, which the GPU or monitor don't know about.


How does it work fine today given that all kernel seems to know is 'current' or 
'current+1' vsyncs.
Presumably the applications somehow schedule all this just fine.
If this works without variable refresh for 60Hz, will it not work for a fixed-rate 
"48Hz" monitor (assuming a 24Hz video)?

You're right. I guess a better way to state the point is that it*doesn't* really work today with fixed refresh, but if we're going tointroduce a new API, then why not do so in a way that can fix theseadditional problems as well?

Also, as I wrote separately, there's the case of synchronizing multiple
monitors.


For multimonitor to work with VRR, they'll have to be timing and flip 
synchronized.
This is impossible for an application to manage, it needs driver/HW control or 
you end up with one display flipping before the other and it looks terrible.
And definitely forget about multiGPU without professional workstation-type 
support needed to sync the displays across adapters.

I'm not a display expert, but I find it hard to believe that it's thatdifficult. Perhaps you can help us understand?

Say you have a multi-GPU system, and each GPU has multiple displaysattached, and a single application is driving them all. The applicationqueues flips for all displays with the same target_present_time_nsattribute. Starting at some time T, the application simply asks for thesame present time T + i * 16666667 (or whatever) for frame i from alldisplays.

Of course it's to be expected that some (or all) of the displays willnot be able to hit the target time on the first bunch of flips due tohardware limitations, but as long as the range of supported frame timesis wide enough, I'd expect all of them to drift towards presenting atthe correct time eventually, even across multiple GPUs, with this simplescheme.


Why would that not work to sync up all displays almost perfectly?


[snip]

Are there any real problems with exposing an absolute target present time?

Realistically, how far into the future are you requesting a presentation time?
Won't it almost always be something like current_time+1000/video_frame_rate?
If so, why not just tell the driver to set 1000/video_frame_rate and have the
GPU/monitor create nicely spaced VSYNCs for you that match the source content?

In fact, you probably wouldn't even need to change your video player at all,
other than having it pass the target_frame_duration_ns. You could consider
this a 'hint' as you suggested, since it's cannot be guaranteed in cases your
driver or HW doesn't support variable refresh. If the target_frame_duration_ns
hint is supported/applied, then the video app should have nothing extra to do
that it wouldn't already do for any arbitrary fixed-refresh rate display. If
not supported (say the drm_atomic_check fails with -EINVAL or something), the
video app falls can stop requesting a fixed target_frame_duration_ns.

A fundamental problem I have with a target present time though is how to accommodate
present times that are larger than one VSYNC time? If my monitor has a 40Hz-60Hz
variable refresh, it's easy to translate "my content is 24Hz, repeat this next frame
an integer multiple number of times so that it lands within the monitor range".
Driver fixes display to an even 48Hz and everything good (no worse than a 30Hz clip on a
traditional 60Hz display anyways). This frame-doubling is all hardware based and doesn't
require any polling.

Now if you change that to "show my content in at least X nanoseconds" it can work on all
displays, but the intent of the app is gone and driver/GPU/display cannot optimize. For example,
the HDMI VRR spec defines a "CinemaVRR" mode where target refresh rate error is accounted
for based on 0.1% deviation from requested and the v_total lines are incremented/decremented to
compensate. If we don't know the target rate, we will not be able to comply to this industry
standard specification.

Okay, that's interesting. Does this mean that the display driver stillprograms a refresh rate to some hardware register?

What if you want to initiate some CPU-controlled drift, i.e. you knowyou're targeting 2*24Hz, but you'd like to shift all flip times to be Xms later? Can you program hardware for that, and how does it work? Dohave you twiddle the refresh rate, or can the hardware do it natively?


How about what I wrote in an earlier mail of having attributes:

- target_present_time_ns
- hint_frame_time_ns (optional)

... and if a video player set both, the driver could still do theoptimizations you've explained?

Also, how would you manage an absolute target present time in kernel?  I guess app and 
driver need to use a common system clock or tick count, but when would you know to 'wake 
up' and execute the flip?  If you wait for VSYNC then you'll always timeout out on 
v_total_max (i.e. minimum refresh rate), check your time and see "yup, need to 
present now" and then flip.  Now your monitor just jumped from lowest refresh rate 
to something else which can cause other problems.  If you use some timer, then you're 
burning needless power polling some counter and still wouldn't have the same accuracy you 
could achieve with a fixed duration.

For the clock, we just have to specify which one to take. I believeCLOCK_MONOTONIC makes the most sense for this kind of thing.

For your other questions, I'm afraid I just don't know enough aboutmodern display hardware to give a really good answer, but with my naiveunderstanding I would imagine something like the following:

1. When the atomic commit happens, the driver twiddles with the displaytimings to get the start of scanout for the next frame as close aspossible to the specified target present time (I assume this is whatv_total_max is about?)

2. The kernel then schedules a timer for the time when the displayhardware is finished scanning out the previous frame and starts vblank.

3. In the handler for that timer, the kernel checks whether any fenceassociated to the new frame's surface has signaled. If yes, it changesthe display hardware's framebuffer pointer to the new frame. Otherwise,it atomically registers for the handler to be run again when the fenceis signaled.

3b. The handler should check if vblank has already ended (either due toextreme CPU overload or because the fence was signaled too late).

Actually, that last point makes me wonder how the case of "present ASAP"is actually implemented in hardware.

But again, all this is just from my naive understanding of the displayhardware.


Cheers,
Nicolai


Regards,
   Aric


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC for a render API to support adaptive sync and VRR

Reply via email to