Thanks for looking at this and for the information Yann. I just discovered the 
benchmarks folder, which I'll use from now on to benchmark more accurately.

after running make in the benchmarks folder, I ran bench.sh, but it just seems 
to run copy1. Does bench.sh need to be modified to work on OSX?

cheers,

oli

On 7 Apr 2013, at 17:50, Orlarey Yann wrote:

> Hi Oli,
> 
> The ~10x speedup of your version 2 is due to the fact that all the
> parallel strings not only have the same controls, but are also applied
> to the same two input signals. This leads to redundant computations
> that the Faust compiler is able to discover and factorize.
> 
> Here are my results, quite similar to yours. All tests where compiled
> in scalar and vector modes using icc 13.1.0. and performed as alsa-gtk
> applications on a Asus Zenbook quad-core i7-3517U CPU @ 1.90GHz
> running Linux Mint 14. The results are in CPU usage.
> 
> 
> 1) Results for test1.dsp (your version 1, multiple controls)
> ------------------------------------------------------------
> Test1.dsp is your version 1 example.
> 
> test1, scalar mode  0.23%
> test1, vector mode  0.21%
> 
> Scalar and vector modes have similar results, even if vector mode is a
> little bit faster here.
> 
> 
> 2) Results for test2y.dsp (single controls)
> -------------------------------------------
> Test2y.dsp is similar to your version 2, but derived form test1.dsp by
> simply removing all "...%1i..." from sliders labels, thus leading to
> single controls.
> 
> test2y, scalar mode  0.014%
> test2y, vector mode  0.022%
> 
> As in your experiments test2y is ~10x faster than test1 in vector
> mode. The speedup is even better in scalar mode.
> 
> If we look at the size of generate code for test1.dsp and test2y.dsp
> we have :
> 
> faust test1.dsp  | wc  => 155917 characters
> faust test2y.dsp | wc  =>  21229 characters
> 
> As we can see that test2y C++ translation is ~7x shorter due to many
> redundant computations that the Faust compiler was able to discover
> and factorize. Because of this, the C++ compilation time is also
> shorter.
> 
> To analyze the influence of redundant computations vs fewer controls
> we can modify test2y.dsp to have separate inputs instead of stereo
> inputs. Because strings will be applied to different inputs they wont
> be factorized anymore and we should have performances close to test1.
> 
> 
> 3) results for test2y4.dsp
> --------------------------
> Test2y4.dsp is derived from test2y.dsp by modifying stringbox
> definition form :
> 
> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _;
> 
> to :
> 
> stringbox(n) =  par(s, n, stereostring(_, _, s)) :> _ , _;
> 
> in order to have the strings applied to different inputs and avoid
> factorizations.
> 
> The size of the C++ code is now larger :
> 
> faust test2y4.dsp | wc =>118218 characters
> 
> and the performances are similar to test1 :
> 
> test2y4, scalar mode  0.15%
> test2y4, vector mode  0.12%
> 
> Test2y4 is still faster than test1 because it has less controls to
> compute.
> 
> 4) results of test1smoothless.dsp
> ---------------------------------
> But if we simplify the control signals of test1 by removing all the
> smooth, then test1smoothless.dsp outperform test2y4 in vector mode.
> 
> test1smoothless, scalar mode  0.15%
> test1smoothless, vector mode  0.09%
> 
> Obviously it is probably not a good idea to remove all smooth, but
> gain of performances can be probably be obtained by reorganizing them.
> In particular it is better, in terms of performances, to smooth after
> expensive computations like pow and similar than before.
> 
> Cheers
> 
> Yann
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Le 20/03/13 20:04, Oli Larkin a écrit :
>> Hi,
>> 
>> I have been experimenting with -vec in a .dsp involving multiple parallel 
>> string resonators. This morning I was amazed at the performance boost I got, 
>> but when I added multiple controls to my .dsp file things slowed down a lot. 
>> Compilation takes much longer and when it finishes the compiled .vst is much 
>> slower than the version with single controls (roughly 10x slower I think). I 
>> realise there is more smoothing taking place, but I'm wondering if there is 
>> something else in play that is causing the compiler not to vectorize the 
>> code as well as with single controls.
>> 
>> I'm compiling on osx 10.6 with faust Version 0.9.59, like this
>> 
>> faust2vst stringbox.dsp -vec
>> 
>> my system is a 2010 MBP, i7
>> 
>> below are three versions of my faust .dsp. Strangely the third version 
>> compiles and runs quickly making me think that the nested parallel 
>> structures are causing problems for the auto vectorization.
>> 
>> thanks very much for any tips,
>> 
>> oli larkin
>> 
>> // 
>> ----------------------------------------------------------------------------------------------------------------------------------
>> // stringbox.dsp VERSION 1 (multiple controls, slow)
>> 
>> declare name "StringBox";
>> declare description "Bank of 8 virtual strings";
>> declare author "Oli Larkin ([email protected])";
>> declare copyright "Oliver Larkin";
>> declare version "0.1";
>> declare licence "GPL";
>>  import("filter.lib");
>> dtmax = 4096;
>> 
>> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999);
>> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999);
>> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999);
>> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999);
>> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999);
>> 
>> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * 
>> fbk) : dcblocker
>> with {
>>      freq = f(s) + ((c-4) * fd);
>>      coeff = damp(s);
>>      dtsamples = (SR/freq) - 2;
>>      fbk = pow(0.001,1.0/( freq*t60(s)));
>> 
>>      h0 = (1. + coeff)/2;
>>      h1 = (1. - coeff)/4;
>>      dampingfilter(x) = (h0 * x' + h1*(x+x''));
>> };
>> 
>> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g(s);
>> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s);
>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _;
>> process = stringbox(8);
>> 
>> // 
>> ----------------------------------------------------------------------------------------------------------------------------------
>> // stringbox.dsp VERSION 2 (single controls, fast)
>> 
>> declare name "StringBox";
>> declare description "Bank of 8 virtual strings";
>> declare author "Oli Larkin ([email protected])";
>> declare copyright "Oliver Larkin";
>> declare version "0.1";
>> declare licence "GPL";
>>  import("filter.lib");
>> dtmax = 4096;
>> 
>> f = hslider("A_freq", 100, 20, 15000, 1) : smooth(0.999);
>> t60 = hslider("B_decay", 4, 0, 60, 0.01) : smooth(0.999);
>> damp = hslider("C_damp", 1., 0, 1, 0.01) : smooth(0.999);
>> g = hslider("D_gain", 0, -70, 0., 0.1) : db2linear : smooth(0.999);
>> fd = hslider("E_diff", 0., 0., 1., 0.0001) : smooth(0.999);
>> 
>> stringloop(x, s, c) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * 
>> fbk) : dcblocker
>> with {
>>      freq = f + ((c-4) * fd);
>>      dtsamples = (SR/freq) - 2;
>>      fbk = pow(0.001,1.0/( freq*t60));
>> 
>>      h0 = (1. + damp)/2;
>>      h1 = (1. - damp)/4;
>>      dampingfilter(x) = (h0 * x' + h1*(x+x''));
>> };
>> 
>> rissetstring(x, s) = _ <: par(c, 9, stringloop(x, s, c)) :> _*0.01*g;
>> stereostring(L, R, s) = rissetstring(L, s), rissetstring(R, s);
>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _;
>> process = stringbox(8);
>> 
>> // 
>> ----------------------------------------------------------------------------------------------------------------------------------
>> // stringbox.dsp VERSION 3 (multiple controls, 1 comb filter per string, 
>> instead of 9,  fast)
>> 
>> declare name "StringBox";
>> declare description "Bank of 8 virtual strings";
>> declare author "Oli Larkin ([email protected])";
>> declare copyright "Oliver Larkin";
>> declare version "0.1";
>> declare licence "GPL";
>>  import("filter.lib");
>> dtmax = 4096;
>> 
>> f(i) = hslider("A_freq%1i", 100, 20, 15000, 1) : smooth(0.999);
>> t60(i) = hslider("B_decay%1i", 4, 0, 60, 0.01) : smooth(0.999);
>> damp(i) = hslider("C_damp%1i", 1., 0, 1, 0.01) : smooth(0.999);
>> g(i) = hslider("D_gain%1i", 0, -70, 0., 0.1) : db2linear : smooth(0.999);
>> 
>> stringloop(x, s) = (+ : fdelay1a(dtmax, dtsamples, x)) ~ (dampingfilter * 
>> fbk) : dcblocker
>> with {
>>      freq = f(s);
>>      coeff = damp(s);
>>      dtsamples = (SR/freq) - 2;
>>      fbk = pow(0.001,1.0/( freq*t60(s)));
>> 
>>      h0 = (1. + coeff)/2;
>>      h1 = (1. - coeff)/4;
>>      dampingfilter(x) = (h0 * x' + h1*(x+x''));
>> };
>> 
>> stereostring(L, R, s) = stringloop(L, s), stringloop(R, s);
>> stringbox(n) = _ , _ <: par(s, n, stereostring(_, _, s)) :> _ , _;
>> process = stringbox(8);
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Faudiostream-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/faudiostream-users
>> 
>> 
> 


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Faudiostream-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/faudiostream-users

Reply via email to