CPU video decoding test, Re: [maemo-developers] gstreamer launcher (proper video sink?)
David D. Hagood wrote: In short, clock per clock, I suspect the DSP can do more video work than the ARM can. Now, you *could* make the argument that if the workload of decoding video could somehow be split between the 2 processor cores there might be some benefit - maybe leave the video scaling to the ARM but let the video coding be done by the DSP core. However, the real limiting factor may not even be MIPS, but rather bandwidth snip Remember what step 0 of optimization is: MEASURE IT FIRST! Until somebody can actually measure where all the time is going, making pronouncements like It's slow because of X - I just know it are the root of all evil - you spend a great deal of time tweaking that one thing only to find out that it was only 1% of the time to begin with. Hello again, I tried to compile current CVS version of ffmpeg http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/?cvsroot=FFMpeg to stop producing pure theories and see how video really plays on OMAP 1710 CPU in N770. ffmpeg compiles fine in scratchbox with few configuration tweaks and includes also simple SDL based player called ffplay. There are some optimizations in libavcodec for arm4vl architecture in arm asm. When these are enabled video playback in ffplay is very good. When converting video for Tungsten T2 (OMAP 1510) and TCPMP player I usually use something like this: mencoder.exe %NAME%.avi -audio-preload 0.8 -delay 0.1 -af volnorm -srate 44100 -oac mp3lame -lameopts mode=2:cbr:br=128 -noodml -vf scale=320:240 -sws 9 -ovc lavc -lavcopts vcodec=mpeg4:vhq:vmax_b_frames=0:vbitrate=304 -ffourcc DIVX -o %NAME%_palm.avi Such videos plays adequately in 25fps on T2. In some scenes frames are skipped but generally the playback is good. I used same files on N770 and while they also play acceptably in N770 video player it is a bit worse than on T2 and in more complex scenes the video player hangs randomly. When this happens video is unplayable for some time (10-20 seconds?) until the DSP is automatically restarted. When I tried same videos with ffplay it plays fine when the audio is turned off (ffplay -an video.avi). Looks like the ffmpeg libavcodec mp3 implementation is not optimized for arm (uses floats?). Video plays by default scaled to 640x480 and even in this resolution playback is fluent. CPU utilization is between 50-100% mostly around 75% (just a guess from load plugin applet). When using 320x240 'ffplay -an -x 320 -y 240 video.avi' (which is what the default N770 video player does as it uses HW pixel doubling) it is even better. In this resolution CPU is rarely at 100%, mostly at 50-75%. You can download ffplay binary compiled for N770 from http://fanoush.webpark.cz/maemo/ffplay.gz for a quick test with your videos. If you get access denied paste the url directly into URL bar, webpark.cz free hosting doesn't like direct links to binaries (=foreign HTTP referer field). Or just checkout ffmpeg from CVS and compile yourself. Of course this is just proof of concept as audio is not usable but it proves the CPU is fast enough to decode mp4 video better (=faster, more stable) than current DSP implementation. Further it also proves that the 'bandwidth problem' is not so bad. Even in 640x480 blitting to video memory seems to be good enough. Maybe there is also some room for further optimizations. The ffmpeg code doesn't use edsp instructions available in armv5te (maybe they are not so useful in reality?) and it is also not the fastest implementation even for armv4. The TCPMP player uses different and faster mp4 decoder. It includes optional ffmpeg plugin but only as a slower but more compatible implementation. Also I'm not sure how optimized is SDL code on N770. From the kernel framebuffer source (drivers/video/omap/hwa742.c) it looks like the display supports YUV surfaces directly but maybe ffplay and SDL uses RGB so there is one or two extra YUV-RGB conversion steps. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
Josep Torra Valles wrote: / $ cat /sus/bus/dsptask/devices/dsptask11/devname mp2dec Is it an mpeg2 audio decoder implemented on DSP ? Can you give me info about how I could use it in my project ? Sources of gstreamer plugins which are using the DSP would be extremely useful for this but they are not available. Multimedia may be part of next Maemo release but still it may not include these sources I'm afraid. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
[EMAIL PROTECTED] wrote: Both of them works! : ) That's cool :) Question: 1)Is it correct for me to use the dspfbsink directly? (Frame buffer) Any pre-processing necessary? I'd guess it isn't, but don't know. You would be missing the new window so it looks like there should be other way. 2)The w.mpg plays correctly on the bundled 770 movie player, but pretty slowly with a lot of frame skips. Is the CPU of Nokia 770 fast enough to handle more advanced codecs, like xvid? All the decoding is done by the DSP. This is IMO bad design choice for (advanced) video. DSP is simply overused and too slow for this. When playing the ice age trailer you can notice with load applet that CPU sits almost idle while DSP is struggling to decode everything. This makes sense for backgroung sound playback, but not for video. You hardly play video on the background :-) It would be interesting to hook in mp4 gstreamer plugin that decodes video by the main CPU. This could result in much higher framerates or better resolution and better utilization of both main CPU and DSP. Or is there some catch why doing it on DSP makes better sense? AFAIK the main ARM CPU should be better suited for the task than the DSP. BTW How much dedicated video memory is there? Can the video driver do double buffering directly in video memory? This would be useful for video playback too. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
On 4/26/06, Frantisek Dufka [EMAIL PROTECTED] wrote: All the decoding is done by the DSP. This is IMO bad design choice for (advanced) video. DSP is simply overused and too slow for this. When playing the ice age trailer you can notice with load applet that CPU sits almost idle while DSP is struggling to decode everything. This makes sense for backgroung sound playback, but not for video. You hardly play video on the background :-) I think we should have modular options to use DSP, one to decode in DSP, other to decode and show... DSP makes sense to decode since it have some useful instructions, like float point instrs, but it's not good to do everything, as for x-refresh sync we can see tearing while viewing videos using Maemo Video Player, what could be avoided by using double buffer and blits handled by main CPU. Also, with this we could use CPU to do some overlays, like OSD. But I wouldn't go with everything on CPU since GUI would become too unresponsive and maybe battery would drain (just guessing here). -- Gustavo Sverzut Barbieri -- Jabber: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] ICQ#: 17249123 Skype: gsbarbieri Mobile: +55 (81) 9927 0010 Phone: +1 (347) 624 6296; [EMAIL PROTECTED] GPG: 0xB640E1A2 @ wwwkeys.pgp.net ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
Frantisek Dufka wrote: better resolution and better utilization of both main CPU and DSP. Or is there some catch why doing it on DSP makes better sense? AFAIK the main ARM CPU should be better suited for the task than the DSP. While I've not started working in depth on the OMAP1710, I do have experience with signal processing and embedded programming - so take this for what it's worth. The ARM does not have floating point, only integer math, so any processing of the video has to be done in integer, and there are a great deal of the operations that really don't fit well. The DSP may not have floating point, but it has fixed point, which is a hybrid of integer and floating point where 1. is represented by 0x7FFF and -1 is 0x8000. The DSP automatically handles shifting multiplies, so that 0x7FFF * 0x7FFF = 0x7FFF (1.0 * 1.0 = 1.0), so you can do many sorts of non-integer math very quickly. Also, many of the operations in video codecs are multiply-and-accumulate operations ( a += b*c ), which DSPs have a single instruction to do. In short, clock per clock, I suspect the DSP can do more video work than the ARM can. Now, you *could* make the argument that if the workload of decoding video could somehow be split between the 2 processor cores there might be some benefit - maybe leave the video scaling to the ARM but let the video coding be done by the DSP core. However, the real limiting factor may not even be MIPS, but rather bandwidth - the speed at which the ARM can pull the data out of flash (esp. MMC - that's not exactly the speediest interface in the world), and the speed at which data can be moved into the video buffer. Remember what step 0 of optimization is: MEASURE IT FIRST! Until somebody can actually measure where all the time is going, making pronouncements like It's slow because of X - I just know it are the root of all evil - you spend a great deal of time tweaking that one thing only to find out that it was only 1% of the time to begin with. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
David D. Hagood wrote: Also, many of the operations in video codecs are multiply-and-accumulate operations ( a += b*c ), which DSPs have a single instruction to do. In short, clock per clock, I suspect the DSP can do more video work than the ARM can. Well OMAP 1710 is ARM5-TEJ - the E letter means EDSP extensions so it has MAC operations too. Remember what step 0 of optimization is: MEASURE IT FIRST! Until somebody can actually measure where all the time is going, making pronouncements like It's slow because of X - I just know it are the root of all evil - you spend a great deal of time tweaking that one thing only to find out that it was only 1% of the time to begin with. True. Well, as for the measuring, I only know my Tungsten T2 (OMAP 1510,168Mhz) can play 320x240, 25FPS, 300Kbits mp4 videos better than my N770. And TCPMP (palmos video player) uses ARM core only. So I suppose even ARM4 core is good enough. I guess 250Mhz ARM5 together with DSP helping with decoding audio (and maybe some video step - blitting with color space conversion?) could do much better. You can watch /sys/devices/platform/dsp/loadinfo while playing video, the DSP is constantly at 100% when playing video. But I admit these are just my theories. I still had no time to investigate gstreamer framework on N770 in detail. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
Am 26 Apr 2006 um 14:47 hat Frantisek Dufka geschrieben: True. Well, as for the measuring, I only know my Tungsten T2 (OMAP 1510,168Mhz) can play 320x240, 25FPS, 300Kbits mp4 videos better than my N770. And TCPMP (palmos video player) uses ARM core only. So I suppose even ARM4 core is good enough. I guess 250Mhz ARM5 together with DSP helping with decoding audio (and maybe some video step - blitting with color space conversion?) could do much better. The OMAP 1510 has a internal Framebuffer - the framebuffer in the N770 is external (I think) and has to be accessed via the 16-bit memory bus. There are internal caches, but program code and the data has also to be transported via the 16-bit memory bus. Maybe that's the bottle neck. The OMAP has also just a 16-bit bus, but has a internal frame buffer, which may be accessed faster without blocking the external 16-bit bus. This the dsp and ARM cpu core share the same memory bus. The cache sizes and built-in memories of the OMAP 1710 cpu are: TMS320C55x DSP core subsystem * Up to 220 MHz (maximum frequency) * 32K x 16-bit on-chip dual-access RAM (DARAM) (64 KB) * 48K x 16-bit on-chip single-access RAM (SARAM) (96 KB) * 24 KB I-cache * One/two instructions executed per cycle * Video hardware accelerators for DCT, iDCT, pixel interpolation, and motion estimation for video compression ARM926TEJ core subsystem * Up to 220 MHz ARM926TEJ V5 architecture (maximum frequency) * 32KB I-cache; 16KB D-cache * Java acceleration * Support for 32-bit and 16-bit (thumb mode) instruction sets * Data and program MMUs * Two 64-entry translation look-aside buffers (TLBs) for MMUs * 17-word write buffer Since the DSP has hardware accelerators for (i)DCT etc I really think it should perform faster than the ARM cpu. I also think (but I am not shure) that TIs 320C55x DSP has a 16 bit opcode length whereas the ARM uses 32 bit (not in thumb mode), so the DSP core should be able to store more program code in its 24 KB instruction cache than the ARM core. Further it would be interesting which MPEG4 decoder the N770 uses, there are a couple of implementations for 320C55x dsps around, more or less optimzied. -Klaus -- Klaus Rotter * klaus at rotters dot de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] gstreamer launcher (proper video sink?)
Hi ppl, first sorry if my english isn't good as I would like. About this topic, I'm working in a partial port of VLC to N770, my primary target is play an mpeg2-ts stream obtained via wifi from a dreambox 7000S STB (sat TV receiver). The stream consumed is 528x576 4:3 25 fps. I've a partial success on this by now. Currently video decoding/output at 35 fps with audio disabled, taking only luma info for a black/white fast colorspace conversion mixed with a half scaling(pixel skiping). My project is build on top of the following components: - VLC: gui, control operation, network layer, video output(SDL module:colorspace conversion YUV2RGB and stretching) - libdvbpsi: stream demux - libmpeg2: video decoding - libmad: audo decoding I modified it in the following way: 1. libmpeg2: implemented a fast IDCT8 using EDSP extensions and minimize memory operations. I've done some experiments with and IDCT4 implementation in order to reduce workload. 2. libmpeg2: skip IDCT for U and V blocks. 3. vlc(SDL video output module): stretching via pixel skiping mixed with fast colospace conversion RGB = YYY. I've thinking that a fixed point colospace conversion rutine using EDSP could be good for my project than the actual SDL implementation that uses an array of precalculated values but I'm not sure on that. Now I'm studing audio problem, libmad is a bad choice because it uses 32 bits fixed point operations(can't be beneficed of 16bits MUL and MAC EDSP operations) and ARM is stressed doing video decoding, etc... I would like use DSP in order to do audio decoding and balance the work. First I thought that I should to implement an DSP audio codec, but when I was studing DSP bridge I found that dsptask 11 is named mp2dec is it an mpeg2 audio decoder. / $ cat /sus/bus/dsptask/devices/dsptask11/devname mp2dec Is it an mpeg2 audio decoder implemented on DSP ? Can you give me info about how I could use it in my project ? I've been reading a bit about gstreamer but I never used it. Reading the web tutorial I've got the impression that in order to use gstreamer properly it must be the main thread, it seems bad for vlc arch, I would like to use dspcodec directly, dropping on it the audio frames demuxed for libdvbpsi, can it be done? Could you give me some info on this topic or an appointment on how to use gstreamer in a slave fashion. About gstreamer, samples that I found are related to command line tools, could you appoint me about how to use it from C. Thanks in advance. Josep Torra Valles http://n770galaxy.blogspot.com/ ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers