On Tue, Mar 20, 2012 at 22:44, C. Mundi <[email protected]> wrote:
> Hi Matt,
>
> You probably know better than me, but it's not obvious to me that these two
> examples (both interesting) are especially similar.  For one thing, your
> example creates a new buffer on every iteration.  My example leaves
> allocation entirely to streams to decide when to buffer and in what size
> chunks.
>
> And you example runs fast.  I modified your code to read test
>
> test.js
> --------
> power = process.argv[2];
> count = Math.pow(2,power);
> var a = [];
> for (i=0;i<count;i++) {
>   a.push(new Buffer('A'));
> }
> console.error(a.length);
>
>
> and then I did this
>
> $ for ((i=0; i<=20; i+=2)); do echo '----------'; time node test.js $i; done
>
> and got  this:
>
> ----------
> 1
>
> real    0m0.409s
> user    0m0.164s
> sys    0m0.020s
> ----------
> 4
>
> real    0m0.234s
> user    0m0.156s
> sys    0m0.020s
> ----------
> 16
>
> real    0m0.234s
> user    0m0.144s
> sys    0m0.036s
> ----------
> 64
>
> real    0m0.231s
> user    0m0.152s
> sys    0m0.024s
> ----------
> 256
>
> real    0m0.232s
> user    0m0.132s
> sys    0m0.052s
> ----------
> 1024
>
> real    0m0.232s
> user    0m0.152s
> sys    0m0.032s
> ----------
> 4096
>
> real    0m0.257s
> user    0m0.180s
> sys    0m0.028s
> ----------
> 16384
>
> real    0m0.362s
> user    0m0.272s
> sys    0m0.040s
> ----------
> 65536
>
> real    0m0.605s
> user    0m0.464s
> sys    0m0.080s
> ----------
> 262144
>
> real    0m1.689s
> user    0m1.448s
> sys    0m0.160s
> ----------
> 1048576
>
> real    0m6.444s
> user    0m5.840s
> sys    0m0.448s
>
> which is a couple orders of magnitude faster than my example for the same
> upper limits of 2^20.
>
> But remember, my example was not slow to stuff the stream.  The slow part
> was draining the stream to disk.  Now that could be because the VFS was
> pushing back (?) against lots of tiny writes, or it could be because (?)
> node streams are backed with a small buffer and stuffing it forced node to
> scavenge for memory to create a linked list of tiny buffer.  If we look at
> the onset of the scaling near 1 KB, we might imagine that stuffing with 1MB
> would require ~1000 chunks to be scavenged.  That does not seem like a big
> job for malloc on a machine with 2GB and basically no load.
>
> Let's look at your array example.  At the high end, I'm allocating and
> tacking on a million one-byte buffers.  Yet it runs quickly.
>
> So I'm still curious to know what determines how the stream created to write
> to disk actually drains.  That would be node code, right?  I mean, not part
> of V8.

Matt is mostly right. Quoting your code:

    for (i=0; i<Math.pow(2,power); i++) {
      ws.write('A');
    }
    ws.end();           // write EOF

Two things happen here that are expensive:

1. 2^n string to buffer conversions (the string 'A' is implicitly
converted to a buffer).

2. 2^n write requests are queued. Each request is sent to the thread
pool. The thread pool is a bottleneck because there are (usually) only
4 worker threads[1][2]. File I/O is an area where Node still needs
some optimization. :-)

On a side note: a best practice when writing to streams is to call
stream.write() until it returns false, then listen for the 'drain'
event before you start writing again.

[1] To be clear, writes are ordered. Simply increasing the size of the
thread pool won't improve performance of a single WriteStream because
the writes need to be serialized anyway. But if the thread pool is
already running at full capacity, then your file I/O will suffer too.

[2] dns.lookup() is affected too, it calls getaddrinfo() from the thread pool.

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to