Thanks for looking at this and for the information Yann. I just discovered the benchmarks folder, which I'll use from now on to benchmark more accurately.
after running make in the benchmarks folder, I ran bench.sh, but it just seems to run copy1. Does bench.sh need to be modified to work on OSX? cheers, oli On 7 Apr 2013, at 17:50, Orlarey Yann wrote: > Hi Oli, > > The ~10x speedup of your version 2 is due to the fact that all the > parallel strings not only have the same controls, but are also applied > to the same two input signals. This leads to redundant computations > that the Faust compiler is able to discover and factorize. > > Here are my results, quite similar to yours. All tests where compiled > in scalar and vector modes using icc 13.1.0. and performed as alsa-gtk > applications on a Asus Zenbook quad-core i7-3517U CPU @ 1.90GHz > running Linux Mint 14. The results are in CPU usage. > > > 1) Results for test1.dsp (your version 1, multiple controls) > ------------------------------------------------------------ > Test1.dsp is your version 1 example. > > test1, scalar mode 0.23% > test1, vector mode 0.21% > > Scalar and vector modes have similar results, even if vector mode is a > little bit faster here. > > > 2) Results for test2y.dsp (single controls) > ------------------------------------------- > Test2y.dsp is similar to your version 2, but derived form test1.dsp by > simply removing all "...%1i..." from sliders labels, thus leading to > single controls. > > test2y, scalar mode 0.014% > test2y, vector mode 0.022% > > As in your experiments test2y is ~10x faster than test1 in vector > mode. The speedup is even better in scalar mode. > > If we look at the size of generate code for test1.dsp and test2y.dsp > we have : > > faust test1.dsp | wc => 155917 characters > faust test2y.dsp | wc => 21229 characters > > As we can see that test2y C++ translation is ~7x shorter due to many > redundant computations that the Faust compiler was able to discover > and factorize. Because of this, the C++ compilation time is also > shorter. > > To analyze the influence of redundant computations vs fewer controls > we can modify test2y.dsp to have separate inputs instead of stereo > inputs. Because strings will be applied to different inputs they wont > be factorized anymore and we should have performances close to test1. > > > 3) results for test2y4.dsp > -------------------------- > Test2y4.dsp is derived from test2y.dsp by modifying stringbox > definition form : > > stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; > > to : > > stringbox(n) = par(s, n, stereostring(_, _, s)) :> _ , _; > > in order to have the strings applied to different inputs and avoid > factorizations. > > The size of the C++ code is now larger : > > faust test2y4.dsp | wc =>118218 characters > > and the performances are similar to test1 : > > test2y4, scalar mode 0.15% > test2y4, vector mode 0.12% > > Test2y4 is still faster than test1 because it has less controls to > compute. > > 4) results of test1smoothless.dsp > --------------------------------- > But if we simplify the control signals of test1 by removing all the > smooth, then test1smoothless.dsp outperform test2y4 in vector mode. > > test1smoothless, scalar mode 0.15% > test1smoothless, vector mode 0.09% > > Obviously it is probably not a good idea to remove all smooth, but > gain of performances can be probably be obtained by reorganizing them. > In particular it is better, in terms of performances, to smooth after > expensive computations like pow and similar than before. > > Cheers > > Yann > > > > > > > > > > > > Le 20/03/13 20:04, Oli Larkin a écrit : >> Hi, >> >> I have been experimenting with -vec in a .dsp involving multiple parallel >> string resonators. This morning I was amazed at the performance boost I got, >> but when I added multiple controls to my .dsp file things slowed down a lot. >> Compilation takes much longer and when it finishes the compiled .vst is much >> slower than the version with single controls (roughly 10x slower I think). I >> realise there is more smoothing taking place, but I'm wondering if there is >> something else in play that is causing the compiler not to vectorize the >> code as well as with single controls. >> >> I'm compiling on osx 10.6 with faust Version 0.9.59, like this >> >> faust2vst stringbox.dsp -vec >> >> my system is a 2010 MBP, i7 >> >> below are three versions of my faust .dsp. Strangely the third version >> compiles and runs quickly making me think that the nested parallel >> structures are causing problems for the auto vectorization. >> >> thanks very much for any tips, >> >> oli larkin >> >> // >> ---------------------------------------------------------------------------------------------------------------------------------- >> // stringbox.dsp VERSION 1 (multiple controls, slow) >> >> declare name "StringBox"; >> declare description "Bank of 8 virtual strings"; >> declare author "Oli Larkin ([email protected])"; >> declare copyright "Oliver Larkin"; >> declare version "0.1"; >> declare licence "GPL"; >> import("filter.lib"); >> dtmax = 4096; >> >> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999); >> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999); >> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999); >> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999); >> >> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * >> fbk) : dcblocker >> with { >> freq = f(s) + ((c-4) * fd); >> coeff = damp(s); >> dtsamples = (SR/freq) - 2; >> fbk = pow(0.001,1.0/( freq*t60(s))); >> >> h0 = (1. + coeff)/2; >> h1 = (1. - coeff)/4; >> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >> }; >> >> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g(s); >> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s); >> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >> process = stringbox(8); >> >> // >> ---------------------------------------------------------------------------------------------------------------------------------- >> // stringbox.dsp VERSION 2 (single controls, fast) >> >> declare name "StringBox"; >> declare description "Bank of 8 virtual strings"; >> declare author "Oli Larkin ([email protected])"; >> declare copyright "Oliver Larkin"; >> declare version "0.1"; >> declare licence "GPL"; >> import("filter.lib"); >> dtmax = 4096; >> >> f = hslider("A_freq", 100, 20, 15000, 1) : smooth(0.999); >> t60 = hslider("B_decay", 4, 0, 60, 0.01) : smooth(0.999); >> damp = hslider("C_damp", 1., 0, 1, 0.01) : smooth(0.999); >> g = hslider("D_gain", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999); >> >> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * >> fbk) : dcblocker >> with { >> freq = f + ((c-4) * fd); >> dtsamples = (SR/freq) - 2; >> fbk = pow(0.001,1.0/( freq*t60)); >> >> h0 = (1. + damp)/2; >> h1 = (1. - damp)/4; >> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >> }; >> >> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g; >> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s); >> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >> process = stringbox(8); >> >> // >> ---------------------------------------------------------------------------------------------------------------------------------- >> // stringbox.dsp VERSION 3 (multiple controls, 1 comb filter per string, >> instead of 9, fast) >> >> declare name "StringBox"; >> declare description "Bank of 8 virtual strings"; >> declare author "Oli Larkin ([email protected])"; >> declare copyright "Oliver Larkin"; >> declare version "0.1"; >> declare licence "GPL"; >> import("filter.lib"); >> dtmax = 4096; >> >> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999); >> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999); >> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999); >> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999); >> >> stringloop(x, s) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * >> fbk) : dcblocker >> with { >> freq = f(s); >> coeff = damp(s); >> dtsamples = (SR/freq) - 2; >> fbk = pow(0.001,1.0/( freq*t60(s))); >> >> h0 = (1. + coeff)/2; >> h1 = (1. - coeff)/4; >> dampingfilter(x) = (h0 * x' + h1*(x+x'')); >> }; >> >> stereostring(L, R, s) = stringloop(L, s), stringloop(R, s); >> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _; >> process = stringbox(8); >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_mar >> _______________________________________________ >> Faudiostream-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/faudiostream-users >> >> > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Faudiostream-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/faudiostream-users
