[nodejs] Re: Future of asynchronous programming in node

Bruno Jouhier Mon, 12 Aug 2013 01:24:50 -0700


On Monday, August 12, 2013 1:06:37 AM UTC+2, spion wrote:
>
> Bruno,
>
> Gorgi, continuing the discussion here because comments seems to be off on 
>> your blog.
>>
>  
> Haven't turned off the comments - its a generated static site, and I'm 
> still thinking if I should add something like disqus to it
>
>
>> It is great that you took the time to write all these variants of the 
>> same program, and that you analyzed it under diffferent angles. Very well 
>> done and well presented.
>>
>> I just have another comment on the performance comparison: you increased 
>> the I/O latency and concluded that the CPU overhead won't be noticeable in 
>> I/O bound situations. This is true if you only care about the perceived 
>> latency for an end user but it is not true if you consider it from a 
>> scalability/cost view point. If you are hosting a service with lots of 
>> users, you care about the number of VMs that you have to deploy to maintain 
>> a given quality of service. If your CPU slices are smaller you'll be able 
>> to serve more users with less CPU cores and your hosting costs will go down.
>>
>> So, a bench with a timeout of 0 is also interesting because it tells you 
>> how well your service will scale.
>>
>>
> Thats why I also presented the t = 1ms benchmark. I don't think its all 
> that different from t = 0 - but I will try adding a switch for setImmediate 
> and see what comes up.
>


A lot of things can happen in 1ms. In our app the average tick processing 
time is below 100µs. 

 
>
>> On Saturday, August 10, 2013 2:26:13 PM UTC+2, Bruno Jouhier wrote:
>>>
>>> Good job Gorgi.
>>>
>>> You have tested a single function making several async calls. It would 
>>> be interesting to also test with several layers of async calls (async f1 
>>> calling async f2 calling async f3 ...). My guess is that you will see a 
>>> significant difference in the way fibers compares with the other solutions 
>>> (also it would be interesting to include Marcel's futures library in the 
>>> bench)
>>>
>>> We have recently started to benchmark our application. Our app is now 
>>> about 100 klocs and it is all written in streamline.js. We were running it 
>>> in "callbacks" mode before and we switched to "fibers-fast" mode recently. 
>>> It made a big difference: the application is now 5 times faster and it uses 
>>> 2 or 3 times less memory!!
>>>
>>> Why is that? This is because we have a lot more streamline-to-streamline 
>>> calls than streamline-to-nativeio calls in our app. In fibers-fast mode, 
>>> the streamline-to-streamline callls are *not* transformed at all: there are 
>>> no wrappers around async function definitions and there are no callback 
>>> closures allocated for these calls: an async function calling another async 
>>> function is just as lean and efficient as a sync function calling another 
>>> sync function. You only get the overhead of closures and fibers at the 
>>> boundaries: when node is calling our code (a fiber gets allocated) and when 
>>> our code calls node libraries (yield + allocation of a closure). My guess 
>>> is that most of the speedup comes from the fact that we got rid of all the 
>>> intermediate wrappers and callback closures: a lot less memory allocation 
>>> => a lot less time spent in the GC. All this thanks to deep continuations. 
>>>
>>
> Yes, it would definitely result in measurements closer to real-world 
> scenarios. Similarly using yield* in generators could be compared. I think 
> your fibonacci benchmark works well for that, but I also want to compare 
> code complexity as well as subjective feel when writing the code, so I'd 
> rather use real application code and mocked functions.
>

Problem is that it is tedious to create a bench with real application code 
and deeper call stacks. It feels like writing a real app.
 

>
> Oh and I still haven't considered the debuggability of generators when 
> using yield*, I should also check that. 
>

>>> We haven't stress tested the real app in generators mode yet because we 
>>> deploy it on 0.10.x but I did a quick bench of the multi-layer call 
>>> scenario to compare callbacks, fibers and generators a while ago (
>>> https://gist.github.com/bjouhier/5554200). Note that this bench was run 
>>> with the very first brew of generators (V8 3.19); generators are probably 
>>> faster now and will probably get faster in the future (is Crankshaft 
>>> enabled today??). So these early results should be taken with a pinch of 
>>> salt.
>>>
>>> I'm not saying that fibers will outperform in all scenarios but there 
>>> are application scenarios where they shine.
>>>
>>
> No idea if something has changed in 3.20 with regards to optimization. I 
> tried your benchmark on 0.11.5 (removing fibers as they don't compile with 
> 0.11.4 or 0.11.5) and it doesn't seem like things have changed much.
>

3.19 was buggy and crashed my bench when modulo was < 3. 
3.20 is much more stable (haven't seen it crash yet) but from what I've 
heard generators don't go through Crankshaft optimization yet. So we can 
expect speedups in the future.
 

>
> I will probably wait until 0.12 until I make the final measurements and 
> before making any switching decisions.
>
> In my opinion, its also important to try other "rawer" generator libraries 
> such as suspend. I don't know exactly how streamline works but there are 
> several things that can slow generators down and additionally it seems like 
> galaxy adds some overhead to enable streamline compatibility. The only 
> way to find out if that overhead is significant or not is to compare with 
> other libraries :)
>
>
My bench was less ambitious because I was just comparing the 3 streamline 
modes and native callbacks. I have now introduced fast variants of the 
fibers and generators mode. These variants eliminate the transformation 
overhead in most of the code (all the intermediate calls). I'll adapt the 
bench and rerun it. 

Galaxy in itself does not add much overhead. So I'm expecting it to come 
close to suspend and genny, and it may have an advantage in "deep stack" 
scenarios because it won't allocate callback closures for intermediate 
calls (they are handled by a tight while loop inside the galaxy.run 
function). But this is just a guess at this point and it need to be benched.

Bruno

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[nodejs] Re: Future of asynchronous programming in node

Reply via email to