CPU video decoding test, Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-05-13 Thread Frantisek Dufka

David D. Hagood wrote:


In short, clock per clock, I suspect the DSP can do more video work than 
the ARM can.


Now, you *could* make the argument that if the workload of decoding 
video could somehow be split between the 2 processor cores there might 
be some benefit - maybe leave the video scaling to the ARM but let the 
video coding be done by the DSP core. However, the real limiting factor 
may not even be MIPS, but rather bandwidth 

snip

Remember what step 0 of optimization is: MEASURE IT FIRST!

Until somebody can actually measure where all the time is going, making 
pronouncements like It's slow because of X - I just know it are the 
root of all evil - you spend a great deal of time tweaking that one 
thing only to find out that it was only 1% of the time to begin with.


Hello again,

I tried to compile current CVS version of ffmpeg 
http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/?cvsroot=FFMpeg to stop 
producing pure theories and see how video really plays on OMAP 1710 CPU 
in N770. ffmpeg compiles fine in scratchbox with few configuration 
tweaks and includes also simple SDL based player called ffplay. There 
are some optimizations in libavcodec for arm4vl architecture in arm asm. 
When these are enabled video playback in ffplay is very good.


When converting video for Tungsten T2 (OMAP 1510) and TCPMP player I 
usually use something like this:


mencoder.exe %NAME%.avi  -audio-preload 0.8 -delay 0.1  -af volnorm 
-srate 44100 -oac mp3lame -lameopts mode=2:cbr:br=128  -noodml  -vf 
scale=320:240 -sws 9 -ovc lavc -lavcopts 
vcodec=mpeg4:vhq:vmax_b_frames=0:vbitrate=304 -ffourcc DIVX -o 
%NAME%_palm.avi


Such videos plays adequately in 25fps on T2. In some scenes frames are 
skipped but generally the playback is good.


I used same files on N770 and while they also play acceptably in N770 
video player it is a bit worse than on T2 and in more complex scenes the 
video player hangs randomly. When this happens video is unplayable for 
some time (10-20 seconds?) until the DSP is automatically restarted.


When I tried same videos with ffplay it plays fine when the audio is 
turned off (ffplay -an video.avi). Looks like the ffmpeg libavcodec mp3 
implementation is not optimized for arm (uses floats?). Video plays by 
default scaled to 640x480 and  even in this resolution playback is 
fluent. CPU utilization is between 50-100% mostly around 75% (just a 
guess from load plugin applet). When using 320x240 'ffplay -an -x 320 -y 
240 video.avi'  (which is what the default N770 video player does as it 
uses HW pixel doubling) it is even better. In this resolution CPU is 
rarely at 100%, mostly at 50-75%.


You can download ffplay binary compiled for N770 from 
http://fanoush.webpark.cz/maemo/ffplay.gz for a quick test with your 
videos. If you get access denied paste the url directly into URL bar, 
webpark.cz free hosting doesn't like direct links to binaries (=foreign 
HTTP referer field). Or just checkout ffmpeg from CVS and compile yourself.


Of course this is just proof of concept as audio is not usable but it 
proves the CPU is fast enough to decode mp4 video better (=faster, more 
stable) than current DSP implementation. Further it also proves that the 
'bandwidth problem' is not so bad. Even in 640x480 blitting to video 
memory seems to be good enough.


Maybe there is also some room for further optimizations. The ffmpeg code 
doesn't use edsp instructions available in armv5te (maybe they are not 
so useful in reality?) and it is also not the fastest implementation 
even for armv4. The TCPMP player uses different and faster mp4 decoder. 
It includes optional ffmpeg plugin but only as a slower but more 
compatible implementation. Also I'm not sure how optimized is SDL code 
on N770. From the kernel framebuffer source 
(drivers/video/omap/hwa742.c) it looks like the display supports YUV 
surfaces directly but maybe ffplay and SDL uses RGB so there is one or 
two extra YUV-RGB conversion steps.


Frantisek

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-27 Thread Frantisek Dufka

Josep Torra Valles wrote:


/ $ cat /sus/bus/dsptask/devices/dsptask11/devname
mp2dec

Is it an mpeg2 audio decoder implemented on DSP ?
Can you give me info
about how I could use it in my project ?


Sources of gstreamer plugins which are using the DSP would be extremely 
useful for this but they are not available. Multimedia may be part of 
next Maemo release but still it may not include these sources I'm afraid.


Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread Frantisek Dufka

[EMAIL PROTECTED] wrote:


Both of them works! : )


That's cool :)


Question:

1)Is it correct for me to use the dspfbsink directly?  (Frame 
buffer)  Any pre-processing necessary?


I'd guess it isn't, but don't know. You would be missing the new window 
so it looks like there should be other way.




2)The w.mpg plays correctly on the bundled 770 movie player, but 
pretty slowly with a lot of frame skips.  Is the CPU of Nokia 770 fast 
enough to handle more advanced codecs, like xvid?


All the decoding is done by the DSP. This is IMO bad design choice for 
(advanced) video. DSP is simply overused and too slow for this. When 
playing the ice age trailer you can notice with load applet that CPU 
sits almost idle while DSP is struggling to decode everything. This 
makes sense for backgroung sound playback, but not for video. You hardly 
play video on the background :-)


It would be interesting to hook in mp4 gstreamer plugin that decodes 
video by the main CPU. This could result in much higher framerates or 
better resolution and better utilization of both main CPU and DSP. Or is 
there some catch why doing it on DSP makes better sense? AFAIK the main 
ARM CPU should be better suited for the task than the DSP.


BTW How much dedicated video memory is there? Can the video driver do 
double buffering directly in video memory? This would be useful for 
video playback too.


Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread Gustavo Sverzut Barbieri
On 4/26/06, Frantisek Dufka [EMAIL PROTECTED] wrote:
 All the decoding is done by the DSP. This is IMO bad design choice for
 (advanced) video. DSP is simply overused and too slow for this. When
 playing the ice age trailer you can notice with load applet that CPU
 sits almost idle while DSP is struggling to decode everything. This
 makes sense for backgroung sound playback, but not for video. You hardly
 play video on the background :-)

I think we should have modular options to use DSP, one to decode in
DSP, other to decode and show...

DSP makes sense to decode since it have some useful instructions, like
float point instrs, but it's not good to do everything, as for
x-refresh sync we can see tearing while viewing videos using Maemo
Video Player, what could be avoided by using double buffer and blits
handled by main CPU.

Also, with this we could use CPU to do some overlays, like OSD.

But I wouldn't go with everything on CPU since GUI would become too
unresponsive and maybe battery would drain (just guessing here).


--
Gustavo Sverzut Barbieri
--
Jabber: [EMAIL PROTECTED]
   MSN: [EMAIL PROTECTED]
  ICQ#: 17249123
 Skype: gsbarbieri
Mobile: +55 (81) 9927 0010
 Phone:  +1 (347) 624 6296; [EMAIL PROTECTED]
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread David D. Hagood

Frantisek Dufka wrote:

better resolution and better utilization of both main CPU and DSP. Or is 
there some catch why doing it on DSP makes better sense? AFAIK the main 
ARM CPU should be better suited for the task than the DSP.




While I've not started working in depth on the OMAP1710, I do have 
experience with signal processing and embedded programming - so take 
this for what it's worth.


The ARM does not have floating point, only integer math, so any 
processing of the video has to be done in integer, and there are a great 
deal of the operations that really don't fit well.


The DSP may not have floating point, but it has fixed point, which is a 
hybrid of integer and floating point where 1. is represented by 
0x7FFF and -1 is 0x8000. The DSP automatically handles shifting 
multiplies, so that 0x7FFF * 0x7FFF = 0x7FFF (1.0 * 1.0 = 
1.0), so you can do many sorts of non-integer math very quickly. Also, 
many of the operations in video codecs are multiply-and-accumulate 
operations ( a += b*c ), which DSPs have a single instruction to do.


In short, clock per clock, I suspect the DSP can do more video work than 
the ARM can.


Now, you *could* make the argument that if the workload of decoding 
video could somehow be split between the 2 processor cores there might 
be some benefit - maybe leave the video scaling to the ARM but let the 
video coding be done by the DSP core. However, the real limiting factor 
may not even be MIPS, but rather bandwidth - the speed at which the ARM 
can pull the data out of flash (esp. MMC - that's not exactly the 
speediest interface in the world), and the speed at which data can be 
moved into the video buffer.


Remember what step 0 of optimization is: MEASURE IT FIRST!

Until somebody can actually measure where all the time is going, making 
pronouncements like It's slow because of X - I just know it are the 
root of all evil - you spend a great deal of time tweaking that one 
thing only to find out that it was only 1% of the time to begin with.

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread Frantisek Dufka

David D. Hagood wrote:
Also, 
many of the operations in video codecs are multiply-and-accumulate 
operations ( a += b*c ), which DSPs have a single instruction to do.


In short, clock per clock, I suspect the DSP can do more video work than 
the ARM can.


Well OMAP 1710 is ARM5-TEJ - the E letter means EDSP extensions so it 
has MAC operations too.



Remember what step 0 of optimization is: MEASURE IT FIRST!

Until somebody can actually measure where all the time is going, making 
pronouncements like It's slow because of X - I just know it are the 
root of all evil - you spend a great deal of time tweaking that one 
thing only to find out that it was only 1% of the time to begin with.


True. Well, as for the measuring, I only know my Tungsten T2 (OMAP 
1510,168Mhz) can play 320x240, 25FPS, 300Kbits mp4 videos better than my 
N770. And TCPMP (palmos video player) uses ARM core only. So I suppose 
even ARM4 core is good enough. I guess 250Mhz ARM5 together with DSP 
helping with decoding audio (and maybe some video step - blitting with 
color space conversion?) could do much better.


You can watch /sys/devices/platform/dsp/loadinfo while playing video, 
the DSP is constantly at 100% when playing video.


But I admit these are just my theories. I still had no time to 
investigate gstreamer framework on N770 in detail.


Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread klaus
Am 26 Apr 2006 um 14:47 hat Frantisek Dufka geschrieben:
 True. Well, as for the measuring, I only know my Tungsten T2 (OMAP 
 1510,168Mhz) can play 320x240, 25FPS, 300Kbits mp4 videos better than my 
 N770. And TCPMP (palmos video player) uses ARM core only. So I suppose 
 even ARM4 core is good enough. I guess 250Mhz ARM5 together with DSP 
 helping with decoding audio (and maybe some video step - blitting with 
 color space conversion?) could do much better.

The OMAP 1510 has a internal Framebuffer - the framebuffer in the 
N770 is external (I think) and has to be accessed via the 16-bit 
memory bus. There are internal caches, but program code and the 
data has also to be transported via the 16-bit memory bus. Maybe 
that's the bottle neck. The OMAP has also just a 16-bit bus, but has a 
internal frame buffer, which may be accessed faster without blocking 
the external 16-bit bus. This the dsp and ARM cpu core share the 
same memory bus. The cache sizes and built-in memories of the 
OMAP 1710 cpu are:

TMS320C55x DSP core subsystem

* Up to 220 MHz (maximum frequency)
* 32K x 16-bit on-chip dual-access RAM (DARAM) (64 KB)
* 48K x 16-bit on-chip single-access RAM (SARAM) (96 KB)
* 24 KB I-cache
* One/two instructions executed per cycle
* Video hardware accelerators for DCT, iDCT, pixel interpolation, 
and motion estimation for video compression

ARM926TEJ core subsystem

* Up to 220 MHz ARM926TEJ V5 architecture (maximum 
frequency)
* 32KB I-cache; 16KB D-cache
* Java acceleration
* Support for 32-bit and 16-bit (thumb mode) instruction sets
* Data and program MMUs
* Two 64-entry translation look-aside buffers (TLBs) for MMUs
* 17-word write buffer

Since the DSP has hardware accelerators for (i)DCT etc I really think it 
should perform faster than the ARM cpu. I also think (but I am not 
shure) that TIs 320C55x DSP has a 16 bit opcode length whereas the 
ARM uses 32 bit (not in thumb mode), so the DSP core should be able 
to store more program code in its 24 KB instruction cache than the 
ARM core.

Further it would be interesting which MPEG4 decoder the N770 uses, 
there are a couple of implementations for 320C55x dsps around, more 
or less optimzied.

-Klaus
-- 
 Klaus Rotter * klaus at rotters dot de * www.rotters.de

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] gstreamer launcher (proper video sink?)

2006-04-26 Thread Josep Torra Valles
Hi ppl,

first sorry if my english isn't good as I would like.

About this topic, I'm working in a partial port of VLC to N770, my
primary target is play an mpeg2-ts stream obtained via wifi from a
dreambox 7000S STB (sat TV receiver). The stream consumed is 528x576 4:3
25 fps.

I've a partial success on this by now. Currently video decoding/output
at 35 fps with audio disabled, taking only luma info for a black/white
fast colorspace conversion mixed with a half scaling(pixel skiping).

My project is build on top of the following components:

- VLC: gui, control operation, network layer, video output(SDL
module:colorspace conversion YUV2RGB and stretching)
- libdvbpsi: stream demux
- libmpeg2: video decoding
- libmad: audo decoding

I modified it in the following way:

1. libmpeg2: implemented a fast IDCT8 using EDSP extensions and minimize
memory operations. I've done some experiments with and IDCT4
implementation in order to reduce workload.
2. libmpeg2: skip IDCT for U and V blocks.
3. vlc(SDL video output module): stretching via pixel skiping mixed with
fast colospace conversion RGB = YYY. I've thinking that a fixed point
colospace conversion rutine using EDSP could be good for my project than
the actual SDL implementation that uses an array of precalculated values
but I'm not sure on that.

Now I'm studing audio problem, libmad is a bad choice because it uses 32
bits fixed point operations(can't be beneficed of 16bits MUL and MAC
EDSP operations) and ARM is stressed doing video decoding, etc...

I would like use DSP in order to do audio decoding and balance the
work. 

First I thought that I should to implement an DSP audio codec, but when
I was studing DSP bridge I found that dsptask 11 is named mp2dec is it
an mpeg2 audio decoder.

/ $ cat /sus/bus/dsptask/devices/dsptask11/devname
mp2dec

Is it an mpeg2 audio decoder implemented on DSP ? Can you give me info
about how I could use it in my project ?

I've been reading a bit about gstreamer but I never used it. Reading the
web tutorial I've got the impression that in order to use gstreamer
properly it must be the main thread, it seems bad for vlc arch, I would
like to use dspcodec directly, dropping on it the audio frames demuxed
for libdvbpsi, can it be done? Could you give me some info on this topic
or an appointment on how to use gstreamer in a slave fashion.

About gstreamer, samples that I found are related to command line tools,
could you appoint me about how to use it from C. 

Thanks in advance.

Josep Torra Valles
http://n770galaxy.blogspot.com/


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers