Re: [elm-discuss] Re: Elm "faster than JavaScript" (version 0.18) - NOT - Parts I and II...

GordonBGood Wed, 04 Jan 2017 21:33:07 -0800

On Wednesday, 4 January 2017 21:05:58 UTC+7, Bob Zhang wrote:
>
> Hi Gordon, 
>   As you can see, BuckleScript already did a very good good job here, of 
> course, there are still places for improvement. To write extremely high 
> performance code, you should avoid creating a closure in a tight loop, here 
> you can lift the `nxtc` into the top level, in the future, we will do this 
> inside the compiler. Lets take further discussions off the list -- Hongbo
>

Hi Hongbo, yes, BuckleScript did a very good job other than 1) mistakenly 
determining that the closure was necessary, then 2) not automatically 
lifting the closure, and then 3) not inlining the resulting (non-closure) 
function call away (see Haskell join points optimizations).  I'm just 
sayin' ;)  In case you needed adding to your TODO list...

In fact, BuckleScript *already does* catch most cases of these, and didn't 
in this one case because the enclosing loop has been tail call optimized 
(manually in case of the code I showed) so become a "ugly imperative code" 
loop internally, and the compiler then treats the binding `p` derived from 
an impure loop variable as impure and thus needing closure treatment, which 
seems to cause the resulting function to be invalidated as a candidate for 
inlining.  The compiler needs to see that the the binding `p` is invariant 
(in fact guaranteed as the compiler injected a let on the loop variable) 
during the whole creation and execution of the `nxtc` function, therefore 
doesn't need the closure and thus can be inlinee.  (BTW, it seems that 
BuckleScript ignores the [@@inline] and [@inlined] OCaml extensions?)

If  the `nxtc` function is lifted to the top level, then it needs to be 
written so that the otherwise free bindings are arguments (`p` and `cmpsts` 
in this case) and thus it is no longer a closure, and even if there were an 
inner worker closure, BuckleScript already inlines and tail calls when 
captured bindings are arguments or derived from arguments of a function 
call, recognizing that they are pure.  There will still be a function call 
to the lifted `nxtc` where there wouldn't be if it were inlined however, 
which means that inlining would be better, although the function call has 
negligible cost relative to the rest of the work in this case.

The only reason this was noticeable here is that it turns out that the time 
cost is not so much in the creation of the closure but much more the cost 
of binding a function (a closure in this case) to a variable.  It's 
interesting how incredibly slow JavaScript/V8 is at creating a function 
binding rather than just calling one - I calculate 10's of thousands of CPU 
clock cycles; it must be dropping back to evaluating the expression each 
loop rather than JIT compiling it plus the expected time of creating a 
function object on the heap.  This isn't BuckleScript's problem, just that 
BuckleScript can help avoid it for the unwary.

Stepping back, I don't have much to complain about, as before I met 
BuckleScript I would have been writing TypeScript/JavaScript mostly 
imperatively and BuckleScript/OCaml offers the option to do the same, if 
necessary for extremely time critical code!  I've just gotten used to 
avoiding that paradigm.

Yes, let's move the discussion somewhere else as it isn't relevant here. I 
just want to say thanks for the heads-up on BuckleScript, which I am quite 
enjoying in spite of its OCaml syntax foibles.  Now I can develop both Elm 
and BuckleScript inside similar development environments using the Visual 
Studio Code plugins. - Gordon

> On Wed, Jan 4, 2017 at 1:17 AM, GordonBGood <gordo...@gmail.com 
> <javascript:>> wrote:
>
>> Hi Bob/Hongbo,
>>
>> I've run into my first speed problem with BuckleScript:  When it decides 
>> it needs to form a closure of captured free binding(s), it creates a 
>> "function returning a function" with the outer function's arguments being 
>> the current state of the "pure" captured bindings, and the inner returned 
>> function the actual closure code.  When this happens anywhere near an inner 
>> loop, the code gets very slow, although sometimes the compiler is smart 
>> enough to infer that the closure is not necessary is when the binding value 
>> is an argument of an enclosing function; however, this does not happen for 
>> local `let` bindings even when they are in the same scope as the 
>> callee/caller sites.
>>
>> For example, the following bit-packed Sieve of Eratosthenes sieves over a 
>> range and returns the bit packed array of found primes over that range 
>> (odds-only.  It runs about five times slower than using imperative code 
>> for/while loops (top of a quarter million).  Code is as follows:
>>
>> let soe_loop top =
>>   if top < 3 then Array.make 0 0 else
>>   let ndxlmt = (top - 3) lsr 1 in
>>   let cmpsts = Array.make ((ndxlmt lsr 5) + 1) 0 in
>>   for loop = 1 to 1000 do (* do it many times for timing *)
>>     let pir = ref 0 in
>>     while !pir <= ndxlmt do
>>       let pi = !pir in
>>       let p = pi + pi + 3 in
>>       let rec nxtc ci =
>>       if ci > ndxlmt then () else
>>       let w = ci lsr 5 in
>>       cmpsts.(w) <- cmpsts.(w) lor (1 lsl (ci land 31));
>>       nxtc (ci + p) in
>>       let si = (p * p - 3) lsr 1 in
>>       if si > ndxlmt then pir := ndxlmt + 1 else (
>>         if cmpsts.(pi lsr 5) land (1 lsl (pi land 31)) == 0 then
>>           nxtc si
>>   cmpsts
>>
>> where the outer `pi` loops are imperative and the inner `nxtc` loop is 
>> functional; this latter becomes the closure that captures the `p` value as 
>> you can see in the following generated JavaScript:
>>
>> function soe_loop(top) {
>>   if (top < 3) {
>>     return Caml_array.caml_make_vect(0, 0);
>>   }
>>   else {
>>     var ndxlmt = ((top - 3 | 0) >>> 1);
>>     var cmpsts = Caml_array.caml_make_vect((ndxlmt >>> 5) + 1 | 0, 0);
>>     for(var loop = 1; loop <= 1000; ++loop){
>>       var pir = 0;
>>       while(pir <= ndxlmt) {
>>         var pi = pir;
>>         var p = (pi + pi | 0) + 3 | 0;
>>         var nxtc = (function(p){
>>         return function nxtc(_ci) {
>>           while(true) {
>>             var ci = _ci;
>>             if (ci > ndxlmt) {
>>               return /* () */0;
>>             }
>>             else {
>>               var w = (ci >>> 5);
>>               cmpsts[w] = cmpsts[w] | (1 << (ci & 31));
>>               _ci = ci + p | 0<span style="color:#660" 
>> class="m_-2806577415036512984styled-by-p
>>
>

-- 
You received this message because you are subscribed to the Google Groups "Elm 
Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elm-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [elm-discuss] Re: Elm "faster than JavaScript" (version 0.18) - NOT - Parts I and II...

Reply via email to