On 10.04.2018 23:45, Cyr, Aric wrote:
For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.

That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason.  When a game presents a completed frame, they’d like
that to happen as soon as possible.

What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.

Yes, I agree completely.  However that's only truly relevant for fixed
refreshed rate displays.

No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.

Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you,
so why bother?

I think Michel's point is that the monitor and GPU hardware *cannot*
really do this, because there's synchronization with audio to take into
account, which the GPU or monitor don't know about.

How does it work fine today given that all kernel seems to know is 'current' or 
'current+1' vsyncs.
Presumably the applications somehow schedule all this just fine.
If this works without variable refresh for 60Hz, will it not work for a fixed-rate 
"48Hz" monitor (assuming a 24Hz video)?

You're right. I guess a better way to state the point is that it *doesn't* really work today with fixed refresh, but if we're going to introduce a new API, then why not do so in a way that can fix these additional problems as well?


Also, as I wrote separately, there's the case of synchronizing multiple
monitors.

For multimonitor to work with VRR, they'll have to be timing and flip 
synchronized.
This is impossible for an application to manage, it needs driver/HW control or 
you end up with one display flipping before the other and it looks terrible.
And definitely forget about multiGPU without professional workstation-type 
support needed to sync the displays across adapters.

I'm not a display expert, but I find it hard to believe that it's that difficult. Perhaps you can help us understand?

Say you have a multi-GPU system, and each GPU has multiple displays attached, and a single application is driving them all. The application queues flips for all displays with the same target_present_time_ns attribute. Starting at some time T, the application simply asks for the same present time T + i * 16666667 (or whatever) for frame i from all displays.

Of course it's to be expected that some (or all) of the displays will not be able to hit the target time on the first bunch of flips due to hardware limitations, but as long as the range of supported frame times is wide enough, I'd expect all of them to drift towards presenting at the correct time eventually, even across multiple GPUs, with this simple scheme.

Why would that not work to sync up all displays almost perfectly?


[snip]
Are there any real problems with exposing an absolute target present time?

Realistically, how far into the future are you requesting a presentation time? 
Won't it almost always be something like current_time+1000/video_frame_rate?
If so, why not just tell the driver to set 1000/video_frame_rate and have the 
GPU/monitor create nicely spaced VSYNCs for you that match the source content?

In fact, you probably wouldn't even need to change your video player at all, 
other than having it pass the target_frame_duration_ns.  You could consider 
this a 'hint' as you suggested, since it's cannot be guaranteed in cases your 
driver or HW doesn't support variable refresh.  If the target_frame_duration_ns 
hint is supported/applied, then the video app should have nothing extra to do 
that it wouldn't already do for any arbitrary fixed-refresh rate display.  If 
not supported (say the drm_atomic_check fails with -EINVAL or something), the 
video app falls can stop requesting a fixed target_frame_duration_ns.

A fundamental problem I have with a target present time though is how to accommodate 
present times that are larger than one VSYNC time?  If my monitor has a 40Hz-60Hz 
variable refresh, it's easy to translate "my content is 24Hz, repeat this next frame 
an integer multiple number of times so that it lands within the monitor range".  
Driver fixes display to an even 48Hz and everything good (no worse than a 30Hz clip on a 
traditional 60Hz display anyways).  This frame-doubling is all hardware based and doesn't 
require any polling.

Now if you change that to "show my content in at least X nanoseconds" it can work on all 
displays, but the intent of the app is gone and driver/GPU/display cannot optimize.  For example, 
the HDMI VRR spec defines a "CinemaVRR" mode where target refresh rate error is accounted 
for based on 0.1% deviation from requested and the v_total lines are incremented/decremented to 
compensate.  If we don't know the target rate, we will not be able to comply to this industry 
standard specification.

Okay, that's interesting. Does this mean that the display driver still programs a refresh rate to some hardware register?

What if you want to initiate some CPU-controlled drift, i.e. you know you're targeting 2*24Hz, but you'd like to shift all flip times to be X ms later? Can you program hardware for that, and how does it work? Do have you twiddle the refresh rate, or can the hardware do it natively?

How about what I wrote in an earlier mail of having attributes:

- target_present_time_ns
- hint_frame_time_ns (optional)

... and if a video player set both, the driver could still do the optimizations you've explained?


Also, how would you manage an absolute target present time in kernel?  I guess app and 
driver need to use a common system clock or tick count, but when would you know to 'wake 
up' and execute the flip?  If you wait for VSYNC then you'll always timeout out on 
v_total_max (i.e. minimum refresh rate), check your time and see "yup, need to 
present now" and then flip.  Now your monitor just jumped from lowest refresh rate 
to something else which can cause other problems.  If you use some timer, then you're 
burning needless power polling some counter and still wouldn't have the same accuracy you 
could achieve with a fixed duration.

For the clock, we just have to specify which one to take. I believe CLOCK_MONOTONIC makes the most sense for this kind of thing.

For your other questions, I'm afraid I just don't know enough about modern display hardware to give a really good answer, but with my naive understanding I would imagine something like the following:

1. When the atomic commit happens, the driver twiddles with the display timings to get the start of scanout for the next frame as close as possible to the specified target present time (I assume this is what v_total_max is about?)

2. The kernel then schedules a timer for the time when the display hardware is finished scanning out the previous frame and starts vblank.

3. In the handler for that timer, the kernel checks whether any fence associated to the new frame's surface has signaled. If yes, it changes the display hardware's framebuffer pointer to the new frame. Otherwise, it atomically registers for the handler to be run again when the fence is signaled.

3b. The handler should check if vblank has already ended (either due to extreme CPU overload or because the fence was signaled too late).

Actually, that last point makes me wonder how the case of "present ASAP" is actually implemented in hardware.

But again, all this is just from my naive understanding of the display hardware.

Cheers,
Nicolai


Regards,
   Aric


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to