Hi,

On Fri, Apr 20, 2007 at 09:41:45AM +0300, ext Siarhei Siamashka wrote:
> 1. Lockups which look like cycling two sequential frames, very similar or the 
> same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 
> Also keypresses are not very responsive. A fix (or workaround) required
> changing XFlush to XSync in screen update code, now it looks a lot better.

I assume this is basically just a race condition, and it doesn't trigger
on other systems, because they're a lot quicker.
 
> 2. Switching windowed/fullscreen mode generally makes mplayer terminate
> with the following error messages:
> "X11 error: BadValue (integer parameter out of range for operation)"
> "Xlib: unexpected async reply (sequence 0x5db)!"
> A workaround to make this problem less frequent was a code addition which
> prevents screen updates until we get Expose even notification.

Ditto.

> I really don't know much about X11 programming and only started to learning
> it, so your help with some advice may be very useful.

I mainly lurk on the server side, however.

> Looks like MPlayer code
> X11/Xv output code is a big mess with many tricks and workarounds added to
> work on different systems over time. Maybe it contains some bugs which get
> triggered on N800 only, but apparently this code is used for other systems
> without any problems. Can you try experimenting a bit with MPlayer (upstream
> release) yourself to check how it works with N800 xserver? Maybe it can reveal
> some xserver bugs which need to be fixed? Also if MPlayer has some apparently 
> bad X11 code, preparing a clean patch and submitting it upstream maybe a 
> good idea.

Unfortunately, I don't have the time to do this.  Sorry.

> One more strange thing with Xv on N800 can be reproduced by trying to watch
> standard N800 demo video in MPlayer. It has an old familiar tearing line in
> the bottom part of the screen and the performance is very poor. The same file
> plays fine in the standard video player. The only difference is that mplayer
> respects video aspect ratio (this video is not precisely 15:9 but slightly
> off) and shows some small black bands above and below picture and 
> default video player scales it to fit the whole screen. Disabling aspect ratio
> in mplayer with -noaspect option also 'fixes' this problem.
> 
> Using benchmark option we get the following numbers:
> 
> # mplayer -benchmark -quiet Nokia_N800.avi
> [...]
> BENCHMARKs: VC:  33,271s VO:  66,768s A:   0,490s Sys:   5,703s =  106,232s
> BENCHMARK%: VC: 31,3189% VO: 62,8517% A:  0,4614% Sys:  5,3681% = 100,0000%
> BENCHMARKn: disp: 1732 (16,30 fps)  drop: 778 (30%)  total: 2510 (23,63 fps)
> 
> # mplayer -benchmark -quiet -noaspect Nokia_N800.avi
> [...]
> BENCHMARKs: VC:  32,226s VO:  14,350s A:   0,456s Sys:  55,699s =  102,731s
> BENCHMARK%: VC: 31,3694% VO: 13,9687% A:  0,4439% Sys: 54,2180% = 100,0000%
> BENCHMARKn: disp: 2501 (24,35 fps)  drop: 0 (0%)  total: 2501 (24,35 fps)
> 
> So when showing video with proper aspect ratio, we get tearing back and more
> than 4x slowdown in video output code (66,768s vs. 14,350s). This all results
> in 30% of frames dropped.

Okay, I'll take a look at this.  My guess is that the scaling we're
seeing prevents us from using the LCD controller's overlay, possibly
because it's done in software.

> These were the 'usability' problems with Xv. Now we get to performance
> related issues. As YV12 is not natively supported by hardware, some 
> color format conversion and bytes shuffling in video output code is
> unavoidable. It is a good idea to optimize this code if we need a good
> performance for high resolution video playback. Color format conversion 
> can be optimized using assembly, for example maemo port of mplayer
> has a patch for assembly optimized yv12-> yuy2 (yuv420p -> yuyv422) 
> nonscaled conversion which provides a very noticeable ~50% improvement
> on Nokia 770:
> https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayer&rev=129&view=rev
> 
> Also here is a JIT accelerated scaler for yv12-> yuy2 (yuv420p -> yuyv422)
> conversion, it is very fast and supports pixels interpolation (good for image
> quality) :
> https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

The primary conversion we do isn't planar -> packed (this is a fallback
for when the video is obscured), but from planar to another custom
planar format.  It would be good to get ARM assembly for the fallback
path, but most of the problem when using packed lies in having to
transfer the much larger amount of data over the bus.

There's one optimisation that could be done for the YUV420 conversion
(the custom planar format that Hailstorm takes), which removes a branch,
ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
pixel), and unrolls a loop by half.  Might be interesting to see what
effect this has, but I think it'll still be rather small.

> I have seen your code in xserver which does the same job for downscaling, but
> in nonoptimized C and with much higher impact on quality. Using JIT scaler
> there can improve both image quality and performance a lot. The only my
> concern is about instruction cache coherency. As ARM requires explicit
> instructions cache flush for self modyfying or dynamically generated code, I
> wonder if  using just mmap is safe (does it flush cache for allocated region
> of  memory?). Maybe maemo kernel hackers/developers can help with this
> information?

'Downscaling' is overstating it: it just removes enough lines to get the
job done.  I don't believe we have enough CPU power to do proper
interpolation on that path.

Again, this is basically a 'fallback' path, and doesn't hit performance
in the normal case.

Off the top of my head, an mmap will only flush the dcache, not the
icache.  But I haven't tried this out.

> It should be noted, that all this assembly optimized code was developed for
> Nokia 770. N800 has a much faster memory (up to 190MB/s memory copy
> performance vs. 110MB/s on Nokia 770) but requires a bit different
> optimizations (seems to need explicit prefetch with PLD instruction for
> reading data). I'm going to try making N800 optimized color format conversion
> functions a bit later.

Okay, cool.

> But here is one more problem. As color format conversion is done in xserver, 
> it will take a really long time before any such optimizations can be delivered
> to end users. Nokia seems to have unpredictable (to outsiders) and slow
> releases schedule.

Don't look at me. :)

> So for any performance optimizations experiments which result in immediate
> video performance improvement, either direct framebuffer access should be 
> used again or it would be very nice if xserver could provide direct access to
> framebuffer (video planes) in yuy2 and that custom yuv420 format in one of the
> next firmware updates. The xserver itself should not do any excess memory copy
> operations as they degrade performance (and it does such copy for yuy2 at
> least).

'Direct framebuffer access'?  As in, just hand you a pointer to a
framebuffer somewhere and let you write straight to it?  As this would
require a firmware update anyway, I don't really see how this would
improve matters too much, and I really don't want to write any more
Maemo-specific extensions (I've been working very hard to kill XSP).

> Also I'm curious about that yuv420 format. From the comments in your code, it
> looks like it is different from what is described in Epson docs. That seems a
> bit weird.

Which Epson docs?

> Thanks for doing a great job supporting maemo community, your comments have
> been always very informative and helpful here.

No worries. :)  Thanks for your work on the media player; the fallback
paths are, as you've noticed, not necessarily optimal.

Cheers,
Daniel

Attachment: signature.asc
Description: Digital signature

_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to