Re: On optimizing Theora
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tiago Marques wrote: > Can you please try both options with also the following > ones:*-ftree-vectorize -funroll-loops -m3dnow (1) libtheora automatically adds the flags "-O3 -fforce-addr - -fomit-frame-pointer -finline-functions -funroll-loops" to any specified CFLAGS. (2) libtheora's inner loops are largely hand-optimized MMX assembly, so vectorization and 3dnow are unlikely to have a significant impact. (3) I am not particularly interested in trolling through every combination of relevant gcc flags in search of performance benefit. That's the compiler's (and compiler writers') job. My point, instead, was that gcc (at least the version in 767) does not have a good code generator for Geode, and therefore we should not expect any performance increase by rebuilding everything -march=geode. If you are interested in searching for the perfect compiler flags, perhaps you would like to try Acovea (http://www.coyotegulch.com/products/acovea/). - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmhp7AACgkQUJT6e6HFtqRt4wCgl4CpYwb3OqlxUfwkgVvuMsk6 UcYAoJ54o4Oyhgl056lF6HQbbtf245O2 =dFCy -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
Hi, Can you please try both options with also the following ones:*-ftree-vectorize -funroll-loops -m3dnow * Also, it may be a good idea to test both geode and i586 with *-m3dnow *and * -mno3dnow*, since the compiler may be causing problems while vectorizing. Another option is to test also with i486 compilations, as per what I had already found in this thread: http://geode.insideo.net/info-linux_archives/msg00396.html Let me underscore my colleague's statement. Do not use the 586 target. In testing we've found that the 586 "optimized" version can be up to 3x slower vs. the 386/486 versions on the Geode LX. This should be due to Geode LX not being a superscalar processor (while the i586 is) may be causing problems even with the i586 march. Best regards, Tiago Marques On Fri, Feb 20, 2009 at 2:23 PM, Benjamin M. Schwartz < bmsch...@fas.harvard.edu> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Tomeu Vizoso wrote: > > On Fri, Feb 20, 2009 at 06:41, wrote: > >> On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: > >>> GCC 4.3 evidently does not do a very good job of optimizing for geode. > >> What percentage of CPU time was spent in libtheora? > > 100%. The encoder was operating in a continuous loop. > > > Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they > > were involved during your tests, you may have seen little of theora > > itself. > > Neither X nor jffs2 was involved. The input file (y4m or ogv) was cached > in memory, and the output stream (ogv or y4m) was being sent directly to > /dev/null, and not displayed. > > The only action being taken in X was to display, in the Terminal activity, > a text-only progress bar, rendered by the encoder_example, or dump_video > command. These commands are part of libtheora, and were recompiled with > it, so the point remains. > > - --Ben > -BEGIN PGP SIGNATURE- > Version: GnuPG v2.0.9 (GNU/Linux) > > iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv > RD4An26lFRgJ1sRxktsSlG18WjVQ92d7 > =eIOq > -END PGP SIGNATURE- > ___ > Devel mailing list > Devel@lists.laptop.org > http://lists.laptop.org/listinfo/devel > ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tomeu Vizoso wrote: > On Fri, Feb 20, 2009 at 06:41, wrote: >> On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: >>> GCC 4.3 evidently does not do a very good job of optimizing for geode. >> What percentage of CPU time was spent in libtheora? 100%. The encoder was operating in a continuous loop. > Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they > were involved during your tests, you may have seen little of theora > itself. Neither X nor jffs2 was involved. The input file (y4m or ogv) was cached in memory, and the output stream (ogv or y4m) was being sent directly to /dev/null, and not displayed. The only action being taken in X was to display, in the Terminal activity, a text-only progress bar, rendered by the encoder_example, or dump_video command. These commands are part of libtheora, and were recompiled with it, so the point remains. - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv RD4An26lFRgJ1sRxktsSlG18WjVQ92d7 =eIOq -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
On Fri, Feb 20, 2009 at 06:41, wrote: > On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: >> GCC 4.3 evidently does not do a very good job of optimizing for geode. > > What percentage of CPU time was spent in libtheora? Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they were involved during your tests, you may have seen little of theora itself. Regards, Tomeu ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: > GCC 4.3 evidently does not do a very good job of optimizing for geode. What percentage of CPU time was spent in libtheora? -- James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
On optimizing Theora
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have been testing libtheora-1.0 on a MP XO. On build 767, using F9's gcc-4.3, I compiled libtheora with CFLAGS="-march=geode". I tested encode, with the command time encoder_example -v 1 coastguard_cif.y4m > /dev/null using the test video from http://media.xiph.org/video/derf/y4m/coastguard_qcif.y4m. This test ran in 44.15 +/- 0.15 seconds (all times are "user" time). I then tested decode, with the command time dump_video coastguard_cif1.ogv > /dev/null using the ogg video that would be produced by the encoder above were it not redirected to /dev/null. This test ran in 4.60 +/- 0.05 seconds. I then repeated these tests after recompiling with "-march=i586 - -mtune=generic", which I assume are approximately the CFLAGS used by Fedora. The resultant times were 41.6 +/- 0.1 and 4.45 +/- 0.05. In conclusion, compiling libtheora with "-march=geode" causes it to run significantly (20 sigma, 7%) slower than "-march=i586 -mtune=generic" for encoding, and possibly slightly slower for decoding as well. GCC 4.3 evidently does not do a very good job of optimizing for geode. - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmeP4oACgkQUJT6e6HFtqQw8wCdEhQQi0qzQNjn++HQU1uQRMXG +aIAnA/LStzVA7pSZGMRFIWXUbeQv3oc =wp55 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel