On Wednesday 02 May 2007 12:54, Daniel Stone wrote:
> > The 'framebuffer' is just the ordinary system memory, converting color
> > format and copying data to framebuffer will be done with the same
> > performance as simulated in this test. RFBI performance is only critical
> > for asynchronous DMA data transfer to LCD controller which does not
> > introduce any overhead and is performed at the same time as ARM core is
> > doing some other work (decoding the next frame). RFBI performance matters
> > only if data transfer to LCD is still not complete at the time when the
> > next frame is already decoded and is ready to be displayed. When playing
> > video, ARM core and LCD controller are almost always working at the same
> > time performing different tasks in parallel. I think I had already
> > explained these details in [1]
>
> Right.  My point is that the numbers you're showing -- while very good,
> don't get me wrong -- won't necessarily have a huge direct impact on
> video playback.  Particularly if you want to avoid tearing.

I have no idea what other proof would be enough for you. You already got all
the numbers, and even benchmarks with patched xserver. They all confirm
video output performance improvement.

> > So now the results of the tests are consistent - when doing video output,
> > most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420'
> > function.
>
> Well, either that, or just waiting for RFBI transfers to complete.

You need to wait a bit before displaying the next frame anyway, and 
the period between frames for 30 fps video usually eclipses transfer
completion time. If you want some numbers, now 640x480 YUV420 (12bpp) 
screen update takes now 25ms without tearsync flag enabled 
(OMAPFB_FORMAT_FLAG_TEARSYNC for OMAPFB_UPDATE_WINDOW 
ioctl) and 25-42ms with tearsync. For 30 fps video, period between
performing screen updates is normally 33ms. For playing video, we
initiate RFBI transfer, wait till it completes, perform VY12->YUV420 color
format conversion (which should take less than 4ms for 640x480 
considering benmchmark results), wait till it is time to display the next
frame and start RFBI transfer again. For 30 fps video 25ms+4ms is less 
than 33ms, so without tearsync enabled, any 640x480 video should play
fine (considering video output performance). With tearsync enabled, we 
should add the time needed for performing vertical sync in LCD controller
which breaks our nice numbers. Worst case (17ms wait for retrace + 25ms
for actual data transfer) takes more time than 33ms between frames.
We can be saved if LCD controller internal refresh rate is really 60Hz,
it this case video playback will automagically synchronize to LCD refresh 
rate and each frame processing will be done exactly within 2 LCD refresh
cycles (by the time we want to display a video frame, the next vertical will
be near and we will not lose much time waiting for it). If decoding time for
each frame will never exceed 28-29ms (which is a tough limitation, cpu 
usage is not uniform), video playback without dropping any frames will be
possible even with tearsync enabled. That's what I'm investigating now.
In any case, getting ideal 24 fps playback will be a bit easier.

I hope all these explanations are clear now. And this is not just a theory,
but already confirmed by some experiments and practical tests.

> I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ...

Thanks, that is what I would consider 'additional tips and tricks' :)

It is good to know that maemo 3.x development can be also done with 
older scratchbox (I have 0.9.8.8 installed now), I'll try it without upgrading
scratchbox then.

> > Well, anyway, everything worked perfectly and I could play 640x480 video
> > on N800 with the following statistics:
> >
> > VIDEO:  [DIVX]  640x480  12bpp  23.976 fps  886.7 kbps (108.2 kbyte/s)
> > ...
> > BENCHMARKs: VC:  87,757s VO:   8,712s A:   1,314s Sys:   3,835s = 
> > 101,618s BENCHMARK%: VC: 86,3592% VO:  8,5736% A:  1,2932% Sys:  3,7740%
> > = 100,0000% BENCHMARKn: disp: 2044 (20,11 fps)  drop: 355 (14%)  total:
> > 2399 (23,61 fps)
> >
> > As you see, mplayer took 8.712 seconds to display 2044 VGA resolution
> > frames. If we do the necessary calculations, that's 72 millions pixels
> > per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities
> > limit, so this function is the only major contributor to video output
> > time. Video output took much less time than decoding, so it proves that
> > video output overhead can be reduced to minimum (in this test tearsync
> > was not used though).
>
> I'd be curious to see the results from this with tearsync _enabled_?
> i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX
> ioctl before you start writing to memory again.  This is basically the
> limiter for us at this stage.

That's exactly how MPlayer works. It always waits on OMAPFB_SYNC_GFX 
before filling framebuffer with the data for the next frame. Not issuing
OMAPFB_SYNC_GFX would introduce *artificial* tearing not related to sync
with LCD refresh. Actually for this 24 fps video, OMAPFB_SYNC_GFX is not a
problem. The detailed explanation with some numbers was posted above.

When I'm talking about tearsync, I'm talking exclusively about
OMAPFB_FORMAT_FLAG_TEARSYNC for screen updates ioctls.

> > When tearsync comes into action, everything gets a bit more complicated.
> > I'm still investigating its impact on video playback performance.
>
> 'Not good'. :)

Video quality is still quite good even without tearsync (in my definition),
but not perfect. With you definition, tearsync is always enabled in MPlayer
anyway, on Nokia 770 too :)
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to