Le 12/04/13 12:30, Oli Larkin a écrit : > Thanks for looking at this and for the information Yann. I just discovered > the benchmarks folder, which I'll use from now on to benchmark more > accurately. > > after running make in the benchmarks folder, I ran bench.sh, but it just > seems to run copy1. Does bench.sh need to be modified to work on OSX? I did my tests on a Linux machine with a modified version of benchmarks that is not the one of the Faust distribution. Actually I am not sure the bench works for OSX, but if you want to try you must use the coreaudio targets, for example : make gcoreaudioscal
Cheers Yann > > cheers, > > oli > > On 7 Apr 2013, at 17:50, Orlarey Yann wrote: > >> Hi Oli, >> >> The ~10x speedup of your version 2 is due to the fact that all the >> parallel strings not only have the same controls, but are also applied >> to the same two input signals. This leads to redundant computations >> that the Faust compiler is able to discover and factorize. >> >> Here are my results, quite similar to yours. All tests where compiled >> in scalar and vector modes using icc 13.1.0. and performed as alsa-gtk >> applications on a Asus Zenbook quad-core i7-3517U CPU @ 1.90GHz >> running Linux Mint 14. The results are in CPU usage. >> >> >> 1) Results for test1.dsp (your version 1, multiple controls) >> ------------------------------------------------------------ >> Test1.dsp is your version 1 example. >> >> test1, scalar mode 0.23% >> test1, vector mode 0.21% >> >> Scalar and vector modes have similar results, even if vector mode is a >> little bit faster here. >> >> >> 2) Results for test2y.dsp (single controls) >> ------------------------------------------- >> Test2y.dsp is similar to your version 2, but derived form test1.dsp by >> simply removing all "...%1i..." from sliders labels, thus leading to >> single controls. >> >> test2y, scalar mode 0.014% >> test2y, vector mode 0.022% >> >> As in your experiments test2y is ~10x faster than test1 in vector >> mode. The speedup is even better in scalar mode. >> >> If we look at the size of generate code for test1.dsp and test2y.dsp >> we have : >> >> faust test1.dsp | wc => 155917 characters >> faust test2y.dsp | wc => 21229 characters >> >> As we can see that test2y C++ translation is ~7x shorter due to many >> redundant computations that the Faust compiler was able to discover >> and factorize. Because of this, the C++ compilation time is also >> shorter. >> >> To analyze the influence of redundant computations vs fewer controls >> we can modify test2y.dsp to have separate inputs instead of stereo >> inputs. Because strings will be applied to different inputs they wont >> be factorized anymore and we should have performances close to test1. >> >> >> 3) results for test2y4.dsp >> -------------------------- >> Test2y4.dsp is derived from test2y.dsp by modifying stringbox >> definition form : >> >> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >> >> to : >> >> stringbox(n) = par(s, n, stereostring(_, _, s)) :> _ , _; >> >> in order to have the strings applied to different inputs and avoid >> factorizations. >> >> The size of the C++ code is now larger : >> >> faust test2y4.dsp | wc =>118218 characters >> >> and the performances are similar to test1 : >> >> test2y4, scalar mode 0.15% >> test2y4, vector mode 0.12% >> >> Test2y4 is still faster than test1 because it has less controls to >> compute. >> >> 4) results of test1smoothless.dsp >> --------------------------------- >> But if we simplify the control signals of test1 by removing all the >> smooth, then test1smoothless.dsp outperform test2y4 in vector mode. >> >> test1smoothless, scalar mode 0.15% >> test1smoothless, vector mode 0.09% >> >> Obviously it is probably not a good idea to remove all smooth, but >> gain of performances can be probably be obtained by reorganizing them. >> In particular it is better, in terms of performances, to smooth after >> expensive computations like pow and similar than before. >> >> Cheers >> >> Yann >> >> >> >> >> >> >> >> >> >> >> >> Le 20/03/13 20:04, Oli Larkin a écrit : >>> Hi, >>> >>> I have been experimenting with -vec in a .dsp involving multiple parallel >>> string resonators. This morning I was amazed at the performance boost I >>> got, but when I added multiple controls to my .dsp file things slowed down >>> a lot. Compilation takes much longer and when it finishes the compiled .vst >>> is much slower than the version with single controls (roughly 10x slower I >>> think). I realise there is more smoothing taking place, but I'm wondering >>> if there is something else in play that is causing the compiler not to >>> vectorize the code as well as with single controls. >>> >>> I'm compiling on osx 10.6 with faust Version 0.9.59, like this >>> >>> faust2vst stringbox.dsp -vec >>> >>> my system is a 2010 MBP, i7 >>> >>> below are three versions of my faust .dsp. Strangely the third version >>> compiles and runs quickly making me think that the nested parallel >>> structures are causing problems for the auto vectorization. >>> >>> thanks very much for any tips, >>> >>> oli larkin >>> >>> // >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> // stringbox.dsp VERSION 1 (multiple controls, slow) >>> >>> declare name "StringBox"; >>> declare description "Bank of 8 virtual strings"; >>> declare author "Oli Larkin ([email protected])"; >>> declare copyright "Oliver Larkin"; >>> declare version "0.1"; >>> declare licence "GPL"; >>> import("filter.lib"); >>> dtmax = 4096; >>> >>> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999); >>> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999); >>> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999); >>> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >>> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999); >>> >>> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter >>> * fbk) : dcblocker >>> with { >>> freq = f(s) + ((c-4) * fd); >>> coeff = damp(s); >>> dtsamples = (SR/freq) - 2; >>> fbk = pow(0.001,1.0/( freq*t60(s))); >>> >>> h0 = (1. + coeff)/2; >>> h1 = (1. - coeff)/4; >>> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >>> }; >>> >>> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g(s); >>> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s); >>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >>> process = stringbox(8); >>> >>> // >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> // stringbox.dsp VERSION 2 (single controls, fast) >>> >>> declare name "StringBox"; >>> declare description "Bank of 8 virtual strings"; >>> declare author "Oli Larkin ([email protected])"; >>> declare copyright "Oliver Larkin"; >>> declare version "0.1"; >>> declare licence "GPL"; >>> import("filter.lib"); >>> dtmax = 4096; >>> >>> f = hslider("A_freq", 100, 20, 15000, 1) : smooth(0.999); >>> t60 = hslider("B_decay", 4, 0, 60, 0.01) : smooth(0.999); >>> damp = hslider("C_damp", 1., 0, 1, 0.01) : smooth(0.999); >>> g = hslider("D_gain", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >>> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999); >>> >>> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter >>> * fbk) : dcblocker >>> with { >>> freq = f + ((c-4) * fd); >>> dtsamples = (SR/freq) - 2; >>> fbk = pow(0.001,1.0/( freq*t60)); >>> >>> h0 = (1. + damp)/2; >>> h1 = (1. - damp)/4; >>> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >>> }; >>> >>> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g; >>> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s); >>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >>> process = stringbox(8); >>> >>> // >>> ---------------------------------------------------------------------------------------------------------------------------------- >>> // stringbox.dsp VERSION 3 (multiple controls, 1 comb filter per string, >>> instead of 9, fast) >>> >>> declare name "StringBox"; >>> declare description "Bank of 8 virtual strings"; >>> declare author "Oli Larkin ([email protected])"; >>> declare copyright "Oliver Larkin"; >>> declare version "0.1"; >>> declare licence "GPL"; >>> import("filter.lib"); >>> dtmax = 4096; >>> >>> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999); >>> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999); >>> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999); >>> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >>> >>> stringloop(x, s) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * >>> fbk) : dcblocker >>> with { >>> freq = f(s); >>> coeff = damp(s); >>> dtsamples = (SR/freq) - 2; >>> fbk = pow(0.001,1.0/( freq*t60(s))); >>> >>> h0 = (1. + coeff)/2; >>> h1 = (1. - coeff)/4; >>> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >>> }; >>> >>> stereostring(L, R, s) = stringloop(L, s), stringloop(R, s); >>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >>> process = stringbox(8); >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_mar >>> _______________________________________________ >>> Faudiostream-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/faudiostream-users >>> >>> >> > > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Faudiostream-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/faudiostream-users
