Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-15 Thread Kalle Vahlman

2007/1/15, [EMAIL PROTECTED] [EMAIL PROTECTED]:

Am 13 Jan 2007 um 21:00 hat Kalle Vahlman geschrieben:
 For the cairo audience there's the question of the tessellation
 process, can it really be so fast on the PXA-320 or is there a bug
 somewhere that twists the results? What could be so good in PXA-320
 (or not-good on the other devices) that the results are so drastic?

I didn't know that the N800 has a PXA-320 uC by Mravell.


It doesn't. I had three devices (plus my laptop) that I ran the test
on, 770, N800 and a PXA-320-based board.


Has anyone more detailed information about the hardware used in the N800?


As mentioned in the blog entry, it's TI OMAP 2420. Also see:

 http://maemo.org/faq/faq.html#faq-N10129

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-15 Thread klaus
Am 15 Jan 2007 um 10:27 hat Kalle Vahlman geschrieben:
 As mentioned in the blog entry, it's TI OMAP 2420. Also see:

At least it seems that the  TI OMAP is able to drive a 32 bit data path to the 
DDR-RAM. But, it 
says it has just 5 MBit internal framebuffer RAM. This are 5242880 bits, which 
is sufficend for 
5242880 / 800 / 480 = 13,653... bits per pixel. Either has the N800 a dedicated 
display 
controller or only supports 12 bits/pixel.

-Klaus
-- 
 Klaus Rotter * klaus at rotters dot de * www.rotters.de

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


RE: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-15 Thread Jakub.Pavelek
On Sunday 14 January 2007 20:11, Frantisek Dufka wrote:

 Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
  Also Nokia 770 runs not at 220MHz as stated on your page, but at 
  something closer to 250MHz as shown by this test code 
program (and 
  confirmed to be actually 252MHz by somebody from Nokia on #maemo 
  about half a year ago).
 
  So http://maemo.org/faq/faq.html#faq-N10129 is lying?

Well, if I were to create a conspiracy theory, I would suggest 
that it could be done on purpose to make N800 look like a 
bigger improvement when comparing it to 770 ;-)

But most likely it is just a typo, a lot of new docs became 
available lately, so they may contain some minor inaccuracies.

Not a conspiracy, it simply used to be 220MHz and only later went up to
252 MHz. 

Br,

--jakub
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [cairo] Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-15 Thread Daniel Amelang

On 1/15/07, Kalle Vahlman [EMAIL PROTECTED] wrote:

2007/1/16, Daniel Amelang [EMAIL PROTECTED]:
 On 1/13/07, Kalle Vahlman [EMAIL PROTECTED] wrote:
  2007/1/14, Koen Kooi [EMAIL PROTECTED]:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
  
   Siarhei Siamashka schreef:
On Saturday 13 January 2007 21:00, Kalle Vahlman wrote:
  
As for optimizing code for ARM (targeting Nokia 770), there are a few 
things
that are slow (maybe this list is still incomplete):
1. Floating point math is slow without vfp (cairo contains a lot of fp 
math)
  
   Actually not very much if you build it with --disable-some-floatingpoint
 
  I didn't (had forgot about the whole flag :), but will do (should be
  interesting).

 That flag doesn't do anything yet, so it shouldn't be interesting :)

So does grep tell me :)

I guess all the improvements I thought were under that flag were in
fact general improvements... Which is better, I guess.


That's right. We haven't yet made an improvement for FPU-less
platforms that resulted in a perceptible loss on others. The time may
come, though, so the flag remains.

Dan
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Marius Gedminas
On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
 Also Nokia 770 runs not at 220MHz as stated on your page, but at 
 something closer to 250MHz as shown by this test code program 
 (and confirmed to be actually 252MHz by somebody from Nokia 
 on #maemo about half a year ago).

So http://maemo.org/faq/faq.html#faq-N10129 is lying?

Marius Gedminas
-- 
If the code and the comments disagree, then both are probably wrong.
-- Norm Schryer


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Marius Gedminas
On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:
 On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
  Also Nokia 770 runs not at 220MHz as stated on your page, but at 
  something closer to 250MHz as shown by this test code program 
  (and confirmed to be actually 252MHz by somebody from Nokia 
  on #maemo about half a year ago).
 
 So http://maemo.org/faq/faq.html#faq-N10129 is lying?

The OMAP1710 page from Texas Instruments also claims 220 MHz is the
maximum frequency:
http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123navigationId=11991contentId=4670

Marius Gedminas
-- 
Mosher's Law of Software Engineering:
Don't worry if it doesn't work right.
If everything did, you'd be out of a job.


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Frantisek Dufka

Marius Gedminas wrote:

On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:

On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
Also Nokia 770 runs not at 220MHz as stated on your page, but at 
something closer to 250MHz as shown by this test code program 
(and confirmed to be actually 252MHz by somebody from Nokia 
on #maemo about half a year ago).

So http://maemo.org/faq/faq.html#faq-N10129 is lying?


The OMAP1710 page from Texas Instruments also claims 220 MHz is the
maximum frequency:



Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Koen Kooi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Marius Gedminas schreef:
 On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:
 On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
 Also Nokia 770 runs not at 220MHz as stated on your page, but at 
 something closer to 250MHz as shown by this test code program 
 (and confirmed to be actually 252MHz by somebody from Nokia 
 on #maemo about half a year ago).
 So http://maemo.org/faq/faq.html#faq-N10129 is lying?
 
 The OMAP1710 page from Texas Instruments also claims 220 MHz is the
 maximum frequency:
 http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123navigationId=11991contentId=4670

I've seen 330MHz 1710 units. It's a matter of how nice you are to TI :)

regards,

Koen
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFqnQ+MkyGM64RGpERAkisAKC75qv11d38aYMSeWxCvsDUUqlwCACfQU8q
T57VA065vmgILpyB1WQW/SE=
=bWlS
-END PGP SIGNATURE-
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Siarhei Siamashka
On Sunday 14 January 2007 20:11, Frantisek Dufka wrote:

 Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
  Also Nokia 770 runs not at 220MHz as stated on your page, but at
  something closer to 250MHz as shown by this test code program
  (and confirmed to be actually 252MHz by somebody from Nokia
  on #maemo about half a year ago).
 
  So http://maemo.org/faq/faq.html#faq-N10129 is lying?

Well, if I were to create a conspiracy theory, I would suggest that it could
be done on purpose to make N800 look like a bigger improvement when 
comparing it to 770 ;-)

But most likely it is just a typo, a lot of new docs became available lately,
so they may contain some minor inaccuracies.

  The OMAP1710 page from Texas Instruments also claims 220 MHz is the
  maximum frequency:

 Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core.

Hmm, interesting. Can anybody check /proc/omap_clock on N800 device?
I'm particularly curious about DSP clock frequency (as it can be actually
lower than on 770).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Larry Battraw

On 1/14/07, Siarhei Siamashka [EMAIL PROTECTED] wrote:

On Sunday 14 January 2007 20:11, Frantisek Dufka wrote:

 Marius Gedminas wrote:

(snip)

 Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core.

Hmm, interesting. Can anybody check /proc/omap_clock on N800 device?
I'm particularly curious about DSP clock frequency (as it can be actually
lower than on 770).


To avoid having more questions about clocks, here is the full output
from /proc/omap_clocks on a n800.  Anyone know what the second number
is-- a divisor perhaps?

Larry

usb_fck 4800 0
pka_ick 109714285 0
aes_ick 109714285 0
rng_ick 109714285 0
sha_ick 109714285 0
des_ick 109714285 0
vlynq_fck 9600 0
vlynq_ick 109714285 0
i2c_fck 1200 0
i2c_ick 109714285 0
i2c_fck 1200 0
i2c_ick 109714285 0
hdq_fck 1200 0
hdq_ick 109714285 0
eac_fck 9600 1
eac_ick 109714285 1
fac_fck 1200 0
fac_ick 109714285 0
mmc_fck 9600 0
mmc_ick 109714285 1
mspro_fck 9600 0
mspro_ick 109714285 0
wdt3_fck 32000 0
wdt3_ick 109714285 0
wdt4_fck 32000 0
wdt4_ick 109714285 0
mailboxes_ick 109714285 1
cam_ick 109714285 0
cam_fck 9600 0
omapctrl_ick 109714285 1
wdt1_ick 109714285 0
sync_32k_ick 109714285 1
mpu_wdt_fck 32000 1
mpu_wdt_ick 109714285 1
gpios_fck 32000 1
gpios_ick 109714285 1
uart3_fck 4800 0
uart3_ick 109714285 0
uart2_fck 4800 0
uart2_ick 109714285 0
uart1_fck 4800 0
uart1_ick 109714285 0
mcspi_fck 4800 0
mcspi_ick 109714285 0
mcspi_fck 4800 0
mcspi_ick 109714285 0
mcbsp2_fck 9600 0
mcbsp2_ick 109714285 0
mcbsp1_fck 9600 0
mcbsp1_ick 109714285 0
gpt12_fck 32000 0
gpt12_ick 109714285 0
gpt11_fck 32000 0
gpt11_ick 109714285 0
gpt10_fck 32000 0
gpt10_ick 109714285 0
gpt9_fck 32000 0
gpt9_ick 109714285 0
gpt8_fck 32000 0
gpt8_ick 109714285 0
gpt7_fck 32000 0
gpt7_ick 109714285 0
gpt6_fck 32000 0
gpt6_ick 109714285 0
gpt5_fck 32000 1
gpt5_ick 109714285 1
gpt4_fck 32000 0
gpt4_ick 109714285 0
gpt3_fck 32000 0
gpt3_ick 109714285 0
gpt2_fck 32000 0
gpt2_ick 109714285 0
gpt1_fck 32000 1
gpt1_ick 109714285 1
virt_prcm_set 0 0
ssi_l4_ick 109714285 0
l4_ck 109714285 10
usb_l4_ick 54857142 0
ssi_fck 219428571 0
core_l3_ck 109714285 1
dss_54m_fck 5400 0
dss2_fck 1920 0
dss1_fck 109714285 0
dss_ick 109714285 1
gfx_ick 109714285 0
gfx_2d_fck 109714285 0
gfx_3d_fck 109714285 0
iva1_mpu_int_ifck 54857142 0
iva1_ifck 109714285 0
dsp_fck 219428571 2
dsp_ick 109714285 1
mpu_ck 329142857 0
emul_ck 5400 1
sys_clkout2 3200 1
sys_clkout 5400 0
ck_wdt1_osc 1920 0
func_12m_ck 1200 0
func_48m_ck 4800 0
func_96m_ck 9600 2
sleep_ck 32000 0
core_ck 658285714 2
func_54m_ck 5400 1
apll54_ck 5400 2
apll96_ck 9600 2
dpll_ck 658285714 1
alt_ck 5400 0
sys_ck 1920 3
osc_ck 1920 1
func_32k_ck 32000 4
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-13 Thread Siarhei Siamashka
On Saturday 13 January 2007 21:00, Kalle Vahlman wrote:

 We have all sorts of funny hardware at the office, so I thought I'd
 make a quick run of cairo-perf with the Cairo 1.3.10 snapshot and see
 how they relate to each other.

 There's some funny things I encountered in the results, and I hope
 people on both lists can offer insights on why.

 Details at

   http://syslog.movial.fi

 but let's just say that the results were predictable in general, with
 some surprises:

 N800 is naturally faster than 770, but I didn't expect the xlib
 backend to have so big differences between the two.

Maybe these devices were just running different linux kernels (task 
sheduler may be different) and xservers? So quite a lot of code could 
be different and these results can't be used to compare these cpus 
directly.

 For the cairo audience there's the question of the tessellation
 process, can it really be so fast on the PXA-320 or is there a bug
 somewhere that twists the results? What could be so good in PXA-320
 (or not-good on the other devices) that the results are so drastic?

What is the amount of cache on all these devices? If PXA-320 has 
more cache and all the necessary code/data for this test fit it but not on 
the competing device, that could explain the difference.

By the way, here you can take some code for benchmarking cpu clock frequency: 
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/testfreq.c?root=mplayerview=markup
It performs two test runs, the first run contains a loop with 10 add
instructions, the second run just contains the same loop but empty.
Substracting time of the second run from the time of the first run we get 
the time of executing these add instructions only. Number of such 
instructions executed per second can be used to measure cpu clock 
frequency. For getting best precision you may want to increase 
TESTS_COUNT define, it will result in a longer test time though.
This test program can show results a bit lower than the actual clock 
frequency (as we have a multitasking OS and other processes also 
take some time). But real cpu clock frequency can't be lower than the 
result benchmarked :) Even for superscalar cpus, these add 
instructions can't be run in parallel as each new instruction depends
on the result of the previous one (hmm, just thought that the last add 
instruction in a loop can be run in parallel with subs which decreases 
loop counter, maybe some additional tweak will be required).

Also Nokia 770 runs not at 220MHz as stated on your page, but at 
something closer to 250MHz as shown by this test code program 
(and confirmed to be actually 252MHz by somebody from Nokia 
on #maemo about half a year ago).

As for optimizing code for ARM (targeting Nokia 770), there are a few things
that are slow (maybe this list is still incomplete):
1. Floating point math is slow without vfp (cairo contains a lot of fp math)
2. Integer division is slow ('/' and '% operators) as ARM does not have
hardware instruction for it and much less efficient software implementation is
used.
3. write access to noncached memory is slow for read-allocate cache on arm926
core (data is not loaded into cache on write), see more details here:
http://maemo.org/pipermail/maemo-developers/2006-December/006579.html
I have some crude patch for valgrind (callgrind part) to simulate
read-allocate cache behaviour (instead of write-allocate as is simulated 
by default), it can show parts of code which have lots of cache misses. If
anybody is interested, I can try to clean it up and submit upstream:
http://ufo2000.xcomufo.com/maemo/vg-read-allocate-cache-patch.diff


I also had a quick look at cairo sources (without benchmarking it, just to
see general coding style). Some parts of code in it are not optimal. For
example this code chunk from cairo-path-stroke.c relies on integer division
(it is unlikely to cause severe performance decrease here, but may become 
a real problem for tight loops):
[cut]
for (i=start; i != stop; i = (i+1) % pen-num_vertices) {
tri[2] = f-point;
_translate_point (tri[2], pen-vertices[i].point);
_cairo_traps_tessellate_triangle (stroker-traps, tri);
tri[1] = tri[2];
}
[/cut]
If we go deeper into _cairo_traps_tessellate_triangle, we will notice the
following:
[cut]
memcpy (tsort, t, 3 * sizeof (cairo_point_t));
qsort (tsort, 3, sizeof (cairo_point_t), _compare_point_fixed_by_y);
[/cut]
There is unnecessary memcpy operation, also qsort is called for just three
elements! And such performance bottlenecks are quite easy to spot almost
everywhere. Most likely the code that is performance critical, is optimized a
lot better, but anyway at least this part deserved a comment such as 
/* I know that it is slow, but this code is not performance critical and I'm
too lazy to optimize it */ :-) 

Anyway, now I see no surprise that such huge improvements were possible
recently :-)

Also this code does 

Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-13 Thread Koen Kooi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Siarhei Siamashka schreef:
 On Saturday 13 January 2007 21:00, Kalle Vahlman wrote:

 As for optimizing code for ARM (targeting Nokia 770), there are a few things
 that are slow (maybe this list is still incomplete):
 1. Floating point math is slow without vfp (cairo contains a lot of fp math)

Actually not very much if you build it with --disable-some-floatingpoint

regards,

Koen
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFqWDBMkyGM64RGpERAs5EAJ9ooyKzO9GbT5aFffpdWKOfvZ31ZgCgt7Tr
bOkcruGVr+RNqw2NNPwUjFs=
=v9oM
-END PGP SIGNATURE-
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-13 Thread Kalle Vahlman

2007/1/14, Koen Kooi [EMAIL PROTECTED]:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Siarhei Siamashka schreef:
 On Saturday 13 January 2007 21:00, Kalle Vahlman wrote:

 As for optimizing code for ARM (targeting Nokia 770), there are a few things
 that are slow (maybe this list is still incomplete):
 1. Floating point math is slow without vfp (cairo contains a lot of fp math)

Actually not very much if you build it with --disable-some-floatingpoint


I didn't (had forgot about the whole flag :), but will do (should be
interesting).

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers