Re: [cairo] Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On 1/15/07, Kalle Vahlman <[EMAIL PROTECTED]> wrote: 2007/1/16, Daniel Amelang <[EMAIL PROTECTED]>: > On 1/13/07, Kalle Vahlman <[EMAIL PROTECTED]> wrote: > > 2007/1/14, Koen Kooi <[EMAIL PROTECTED]>: > > > -BEGIN PGP SIGNED MESSAGE- > > > Hash: SHA1 > > > > > > Siarhei Siamashka schreef: > > > > On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > > > > > > > As for optimizing code for ARM (targeting Nokia 770), there are a few things > > > > that are slow (maybe this list is still incomplete): > > > > 1. Floating point math is slow without vfp (cairo contains a lot of fp math) > > > > > > Actually not very much if you build it with --disable-some-floatingpoint > > > > I didn't (had forgot about the whole flag :), but will do (should be > > interesting). > > That flag doesn't do anything yet, so it shouldn't be interesting :) So does grep tell me :) I guess all the improvements I thought were under that flag were in fact general improvements... Which is better, I guess. That's right. We haven't yet made an improvement for FPU-less platforms that resulted in a perceptible loss on others. The time may come, though, so the flag remains. Dan ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [cairo] Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
2007/1/16, Daniel Amelang <[EMAIL PROTECTED]>: On 1/13/07, Kalle Vahlman <[EMAIL PROTECTED]> wrote: > 2007/1/14, Koen Kooi <[EMAIL PROTECTED]>: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Siarhei Siamashka schreef: > > > On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > > > > > As for optimizing code for ARM (targeting Nokia 770), there are a few things > > > that are slow (maybe this list is still incomplete): > > > 1. Floating point math is slow without vfp (cairo contains a lot of fp math) > > > > Actually not very much if you build it with --disable-some-floatingpoint > > I didn't (had forgot about the whole flag :), but will do (should be > interesting). That flag doesn't do anything yet, so it shouldn't be interesting :) So does grep tell me :) I guess all the improvements I thought were under that flag were in fact general improvements... Which is better, I guess. -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
RE: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
>On Sunday 14 January 2007 20:11, Frantisek Dufka wrote: > >> Marius Gedminas wrote: >> > On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: >> >> On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: >> >>> Also Nokia 770 runs not at 220MHz as stated on your page, but at >> >>> something closer to 250MHz as shown by this test code >program (and >> >>> confirmed to be actually 252MHz by somebody from Nokia on #maemo >> >>> about half a year ago). >> >> >> >> So http://maemo.org/faq/faq.html#faq-N10129 is lying? > >Well, if I were to create a conspiracy theory, I would suggest >that it could be done on purpose to make N800 look like a >bigger improvement when comparing it to 770 ;-) > >But most likely it is just a typo, a lot of new docs became >available lately, so they may contain some minor inaccuracies. Not a conspiracy, it simply used to be 220MHz and only later went up to 252 MHz. Br, --jakub ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
Am 15 Jan 2007 um 10:27 hat Kalle Vahlman geschrieben: > As mentioned in the blog entry, it's TI OMAP 2420. Also see: At least it seems that the TI OMAP is able to drive a 32 bit data path to the DDR-RAM. But, it says it has just 5 MBit internal framebuffer RAM. This are 5242880 bits, which is sufficend for 5242880 / 800 / 480 = 13,653... bits per pixel. Either has the N800 a dedicated display controller or only supports 12 bits/pixel. -Klaus -- Klaus Rotter * klaus rotters de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
"Frantisek Dufka" <[EMAIL PROTECTED]> writes: >>> So http://maemo.org/faq/faq.html#faq-N10129 is lying? >> >> The OMAP1710 page from Texas Instruments also claims 220 MHz is the >> maximum frequency: > > > Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core. I have understood that 770 ARM runs with 252Mhz frequency. The FAQ is wrong, in my opinion. (This is not the official Nokia view on the matter, just my personal comment.) -- Kalle Valo ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
2007/1/15, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: Am 13 Jan 2007 um 21:00 hat Kalle Vahlman geschrieben: > For the cairo audience there's the question of the tessellation > process, can it really be so fast on the PXA-320 or is there a bug > somewhere that twists the results? What could be so good in PXA-320 > (or not-good on the other devices) that the results are so drastic? I didn't know that the N800 has a PXA-320 uC by Mravell. It doesn't. I had three devices (plus my laptop) that I ran the test on, 770, N800 and a PXA-320-based board. Has anyone more detailed information about the hardware used in the N800? As mentioned in the blog entry, it's TI OMAP 2420. Also see: http://maemo.org/faq/faq.html#faq-N10129 -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
Am 13 Jan 2007 um 21:00 hat Kalle Vahlman geschrieben: > For the cairo audience there's the question of the tessellation > process, can it really be so fast on the PXA-320 or is there a bug > somewhere that twists the results? What could be so good in PXA-320 > (or not-good on the other devices) that the results are so drastic? I didn't know that the N800 has a PXA-320 uC by Mravell. So I googled a bit to find out that it has several interessting points: See here: http://www.marvell.com/products/cellular/application/pxa320.jsp * a 32 bit memory interface (the TI OMAP uC in 770 has just 16 bit) IMHO very important * a 256 kB L2 cache (the TI OMAP just has a L1, IMHO) * a build-in 768 kB frame buffer with 2D accellerator. On the 770 there was just the 16 bit data path to SD-RAM sharing the access to the video/frame buffer chip. So I think there is no longer a dedicated framebuffer chip in the N800. So it becomes more clear why this release of IT2007 could not run on the N770. The underlying hardware seems to be very different. Has anyone more detailed information about the hardware used in the N800? -Klaus -- Klaus Rotter * klaus rotters de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On 1/14/07, Siarhei Siamashka <[EMAIL PROTECTED]> wrote: On Sunday 14 January 2007 20:11, Frantisek Dufka wrote: > Marius Gedminas wrote: (snip) > Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core. Hmm, interesting. Can anybody check /proc/omap_clock on N800 device? I'm particularly curious about DSP clock frequency (as it can be actually lower than on 770). To avoid having more questions about clocks, here is the full output from /proc/omap_clocks on a n800. Anyone know what the second number is-- a divisor perhaps? Larry usb_fck 4800 0 pka_ick 109714285 0 aes_ick 109714285 0 rng_ick 109714285 0 sha_ick 109714285 0 des_ick 109714285 0 vlynq_fck 9600 0 vlynq_ick 109714285 0 i2c_fck 1200 0 i2c_ick 109714285 0 i2c_fck 1200 0 i2c_ick 109714285 0 hdq_fck 1200 0 hdq_ick 109714285 0 eac_fck 9600 1 eac_ick 109714285 1 fac_fck 1200 0 fac_ick 109714285 0 mmc_fck 9600 0 mmc_ick 109714285 1 mspro_fck 9600 0 mspro_ick 109714285 0 wdt3_fck 32000 0 wdt3_ick 109714285 0 wdt4_fck 32000 0 wdt4_ick 109714285 0 mailboxes_ick 109714285 1 cam_ick 109714285 0 cam_fck 9600 0 omapctrl_ick 109714285 1 wdt1_ick 109714285 0 sync_32k_ick 109714285 1 mpu_wdt_fck 32000 1 mpu_wdt_ick 109714285 1 gpios_fck 32000 1 gpios_ick 109714285 1 uart3_fck 4800 0 uart3_ick 109714285 0 uart2_fck 4800 0 uart2_ick 109714285 0 uart1_fck 4800 0 uart1_ick 109714285 0 mcspi_fck 4800 0 mcspi_ick 109714285 0 mcspi_fck 4800 0 mcspi_ick 109714285 0 mcbsp2_fck 9600 0 mcbsp2_ick 109714285 0 mcbsp1_fck 9600 0 mcbsp1_ick 109714285 0 gpt12_fck 32000 0 gpt12_ick 109714285 0 gpt11_fck 32000 0 gpt11_ick 109714285 0 gpt10_fck 32000 0 gpt10_ick 109714285 0 gpt9_fck 32000 0 gpt9_ick 109714285 0 gpt8_fck 32000 0 gpt8_ick 109714285 0 gpt7_fck 32000 0 gpt7_ick 109714285 0 gpt6_fck 32000 0 gpt6_ick 109714285 0 gpt5_fck 32000 1 gpt5_ick 109714285 1 gpt4_fck 32000 0 gpt4_ick 109714285 0 gpt3_fck 32000 0 gpt3_ick 109714285 0 gpt2_fck 32000 0 gpt2_ick 109714285 0 gpt1_fck 32000 1 gpt1_ick 109714285 1 virt_prcm_set 0 0 ssi_l4_ick 109714285 0 l4_ck 109714285 10 usb_l4_ick 54857142 0 ssi_fck 219428571 0 core_l3_ck 109714285 1 dss_54m_fck 5400 0 dss2_fck 1920 0 dss1_fck 109714285 0 dss_ick 109714285 1 gfx_ick 109714285 0 gfx_2d_fck 109714285 0 gfx_3d_fck 109714285 0 iva1_mpu_int_ifck 54857142 0 iva1_ifck 109714285 0 dsp_fck 219428571 2 dsp_ick 109714285 1 mpu_ck 329142857 0 emul_ck 5400 1 sys_clkout2 3200 1 sys_clkout 5400 0 ck_wdt1_osc 1920 0 func_12m_ck 1200 0 func_48m_ck 4800 0 func_96m_ck 9600 2 sleep_ck 32000 0 core_ck 658285714 2 func_54m_ck 5400 1 apll54_ck 5400 2 apll96_ck 9600 2 dpll_ck 658285714 1 alt_ck 5400 0 sys_ck 1920 3 osc_ck 1920 1 func_32k_ck 32000 4 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Sunday 14 January 2007 20:11, Frantisek Dufka wrote: > Marius Gedminas wrote: > > On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: > >> On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: > >>> Also Nokia 770 runs not at 220MHz as stated on your page, but at > >>> something closer to 250MHz as shown by this test code program > >>> (and confirmed to be actually 252MHz by somebody from Nokia > >>> on #maemo about half a year ago). > >> > >> So http://maemo.org/faq/faq.html#faq-N10129 is lying? Well, if I were to create a conspiracy theory, I would suggest that it could be done on purpose to make N800 look like a bigger improvement when comparing it to 770 ;-) But most likely it is just a typo, a lot of new docs became available lately, so they may contain some minor inaccuracies. > > The OMAP1710 page from Texas Instruments also claims 220 MHz is the > > maximum frequency: > > Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core. Hmm, interesting. Can anybody check /proc/omap_clock on N800 device? I'm particularly curious about DSP clock frequency (as it can be actually lower than on 770). ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Marius Gedminas schreef: > On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: >> On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: >>> Also Nokia 770 runs not at 220MHz as stated on your page, but at >>> something closer to 250MHz as shown by this test code program >>> (and confirmed to be actually 252MHz by somebody from Nokia >>> on #maemo about half a year ago). >> So http://maemo.org/faq/faq.html#faq-N10129 is lying? > > The OMAP1710 page from Texas Instruments also claims 220 MHz is the > maximum frequency: > http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=11991&contentId=4670 I've seen 330MHz 1710 units. It's a matter of how nice you are to TI :) regards, Koen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iD8DBQFFqnQ+MkyGM64RGpERAkisAKC75qv11d38aYMSeWxCvsDUUqlwCACfQU8q T57VA065vmgILpyB1WQW/SE= =bWlS -END PGP SIGNATURE- ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
Marius Gedminas wrote: On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: Also Nokia 770 runs not at 220MHz as stated on your page, but at something closer to 250MHz as shown by this test code program (and confirmed to be actually 252MHz by somebody from Nokia on #maemo about half a year ago). So http://maemo.org/faq/faq.html#faq-N10129 is lying? The OMAP1710 page from Texas Instruments also claims 220 MHz is the maximum frequency: Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote: > On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: > > Also Nokia 770 runs not at 220MHz as stated on your page, but at > > something closer to 250MHz as shown by this test code program > > (and confirmed to be actually 252MHz by somebody from Nokia > > on #maemo about half a year ago). > > So http://maemo.org/faq/faq.html#faq-N10129 is lying? The OMAP1710 page from Texas Instruments also claims 220 MHz is the maximum frequency: http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=11991&contentId=4670 Marius Gedminas -- Mosher's Law of Software Engineering: Don't worry if it doesn't work right. If everything did, you'd be out of a job. signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote: > Also Nokia 770 runs not at 220MHz as stated on your page, but at > something closer to 250MHz as shown by this test code program > (and confirmed to be actually 252MHz by somebody from Nokia > on #maemo about half a year ago). So http://maemo.org/faq/faq.html#faq-N10129 is lying? Marius Gedminas -- If the code and the comments disagree, then both are probably wrong. -- Norm Schryer signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
2007/1/14, Koen Kooi <[EMAIL PROTECTED]>: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Siarhei Siamashka schreef: > On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > As for optimizing code for ARM (targeting Nokia 770), there are a few things > that are slow (maybe this list is still incomplete): > 1. Floating point math is slow without vfp (cairo contains a lot of fp math) Actually not very much if you build it with --disable-some-floatingpoint I didn't (had forgot about the whole flag :), but will do (should be interesting). -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Siarhei Siamashka schreef: > On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > As for optimizing code for ARM (targeting Nokia 770), there are a few things > that are slow (maybe this list is still incomplete): > 1. Floating point math is slow without vfp (cairo contains a lot of fp math) Actually not very much if you build it with --disable-some-floatingpoint regards, Koen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iD8DBQFFqWDBMkyGM64RGpERAs5EAJ9ooyKzO9GbT5aFffpdWKOfvZ31ZgCgt7Tr bOkcruGVr+RNqw2NNPwUjFs= =v9oM -END PGP SIGNATURE- ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
On Saturday 13 January 2007 21:00, Kalle Vahlman wrote: > We have all sorts of funny hardware at the office, so I thought I'd > make a quick run of cairo-perf with the Cairo 1.3.10 snapshot and see > how they relate to each other. > > There's some funny things I encountered in the results, and I hope > people on both lists can offer insights on why. > > Details at > > http://syslog.movial.fi > > but let's just say that the results were predictable in general, with > some surprises: > > N800 is naturally faster than 770, but I didn't expect the xlib > backend to have so big differences between the two. Maybe these devices were just running different linux kernels (task sheduler may be different) and xservers? So quite a lot of code could be different and these results can't be used to compare these cpus directly. > For the cairo audience there's the question of the tessellation > process, can it really be so fast on the PXA-320 or is there a bug > somewhere that twists the results? What could be so good in PXA-320 > (or not-good on the other devices) that the results are so drastic? What is the amount of cache on all these devices? If PXA-320 has more cache and all the necessary code/data for this test fit it but not on the competing device, that could explain the difference. By the way, here you can take some code for benchmarking cpu clock frequency: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/testfreq.c?root=mplayer&view=markup It performs two test runs, the first run contains a loop with 10 add instructions, the second run just contains the same loop but empty. Substracting time of the second run from the time of the first run we get the time of executing these add instructions only. Number of such instructions executed per second can be used to measure cpu clock frequency. For getting best precision you may want to increase TESTS_COUNT define, it will result in a longer test time though. This test program can show results a bit lower than the actual clock frequency (as we have a multitasking OS and other processes also take some time). But real cpu clock frequency can't be lower than the result benchmarked :) Even for superscalar cpus, these add instructions can't be run in parallel as each new instruction depends on the result of the previous one (hmm, just thought that the last add instruction in a loop can be run in parallel with subs which decreases loop counter, maybe some additional tweak will be required). Also Nokia 770 runs not at 220MHz as stated on your page, but at something closer to 250MHz as shown by this test code program (and confirmed to be actually 252MHz by somebody from Nokia on #maemo about half a year ago). As for optimizing code for ARM (targeting Nokia 770), there are a few things that are slow (maybe this list is still incomplete): 1. Floating point math is slow without vfp (cairo contains a lot of fp math) 2. Integer division is slow ('/' and '% operators) as ARM does not have hardware instruction for it and much less efficient software implementation is used. 3. write access to noncached memory is slow for read-allocate cache on arm926 core (data is not loaded into cache on write), see more details here: http://maemo.org/pipermail/maemo-developers/2006-December/006579.html I have some crude patch for valgrind (callgrind part) to simulate read-allocate cache behaviour (instead of write-allocate as is simulated by default), it can show parts of code which have lots of cache misses. If anybody is interested, I can try to clean it up and submit upstream: http://ufo2000.xcomufo.com/maemo/vg-read-allocate-cache-patch.diff I also had a quick look at cairo sources (without benchmarking it, just to see general coding style). Some parts of code in it are not optimal. For example this code chunk from cairo-path-stroke.c relies on integer division (it is unlikely to cause severe performance decrease here, but may become a real problem for tight loops): [cut] for (i=start; i != stop; i = (i+1) % pen->num_vertices) { tri[2] = f->point; _translate_point (&tri[2], &pen->vertices[i].point); _cairo_traps_tessellate_triangle (stroker->traps, tri); tri[1] = tri[2]; } [/cut] If we go deeper into _cairo_traps_tessellate_triangle, we will notice the following: [cut] memcpy (tsort, t, 3 * sizeof (cairo_point_t)); qsort (tsort, 3, sizeof (cairo_point_t), _compare_point_fixed_by_y); [/cut] There is unnecessary memcpy operation, also qsort is called for just three elements! And such performance bottlenecks are quite easy to spot almost everywhere. Most likely the code that is performance critical, is optimized a lot better, but anyway at least this part deserved a comment such as /* I know that it is slow, but this code is not performance critical and I'm too lazy to optimize it */ :-) Anyway, now I see no surprise that such huge improvements were possible recently
[maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320
Hi! We have all sorts of funny hardware at the office, so I thought I'd make a quick run of cairo-perf with the Cairo 1.3.10 snapshot and see how they relate to each other. There's some funny things I encountered in the results, and I hope people on both lists can offer insights on why. Details at http://syslog.movial.fi but let's just say that the results were predictable in general, with some surprises: N800 is naturally faster than 770, but I didn't expect the xlib backend to have so big differences between the two. For the cairo audience there's the question of the tessellation process, can it really be so fast on the PXA-320 or is there a bug somewhere that twists the results? What could be so good in PXA-320 (or not-good on the other devices) that the results are so drastic? -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers