Aha! Thank you, Ben. Your point #2 is especially informative. Matt's code and mine both show that creating oodles of Buffers (whether explicitly or implicitly) is not a serious bottleneck compared to the queue handling. Processing the resulting queue of writes is the issue with impact. I did not realize the writes were queued. I assumed incorrectly that the writes were filling a preallocated buffer node on a linked list which would be expanded by chaining as needed. (Hence my wondering about how streams are backed.). Thank you for answering all of my questions in one swoop! On Mar 20, 2012 5:44 PM, "Ben Noordhuis" <[email protected]> wrote:
> On Tue, Mar 20, 2012 at 22:44, C. Mundi <[email protected]> wrote: > > Hi Matt, > > > > You probably know better than me, but it's not obvious to me that these > two > > examples (both interesting) are especially similar. For one thing, your > > example creates a new buffer on every iteration. My example leaves > > allocation entirely to streams to decide when to buffer and in what size > > chunks. > > > > And you example runs fast. I modified your code to read test > > > > test.js > > -------- > > power = process.argv[2]; > > count = Math.pow(2,power); > > var a = []; > > for (i=0;i<count;i++) { > > a.push(new Buffer('A')); > > } > > console.error(a.length); > > > > > > and then I did this > > > > $ for ((i=0; i<=20; i+=2)); do echo '----------'; time node test.js $i; > done > > > > and got this: > > > > ---------- > > 1 > > > > real 0m0.409s > > user 0m0.164s > > sys 0m0.020s > > ---------- > > 4 > > > > real 0m0.234s > > user 0m0.156s > > sys 0m0.020s > > ---------- > > 16 > > > > real 0m0.234s > > user 0m0.144s > > sys 0m0.036s > > ---------- > > 64 > > > > real 0m0.231s > > user 0m0.152s > > sys 0m0.024s > > ---------- > > 256 > > > > real 0m0.232s > > user 0m0.132s > > sys 0m0.052s > > ---------- > > 1024 > > > > real 0m0.232s > > user 0m0.152s > > sys 0m0.032s > > ---------- > > 4096 > > > > real 0m0.257s > > user 0m0.180s > > sys 0m0.028s > > ---------- > > 16384 > > > > real 0m0.362s > > user 0m0.272s > > sys 0m0.040s > > ---------- > > 65536 > > > > real 0m0.605s > > user 0m0.464s > > sys 0m0.080s > > ---------- > > 262144 > > > > real 0m1.689s > > user 0m1.448s > > sys 0m0.160s > > ---------- > > 1048576 > > > > real 0m6.444s > > user 0m5.840s > > sys 0m0.448s > > > > which is a couple orders of magnitude faster than my example for the same > > upper limits of 2^20. > > > > But remember, my example was not slow to stuff the stream. The slow part > > was draining the stream to disk. Now that could be because the VFS was > > pushing back (?) against lots of tiny writes, or it could be because (?) > > node streams are backed with a small buffer and stuffing it forced node > to > > scavenge for memory to create a linked list of tiny buffer. If we look > at > > the onset of the scaling near 1 KB, we might imagine that stuffing with > 1MB > > would require ~1000 chunks to be scavenged. That does not seem like a > big > > job for malloc on a machine with 2GB and basically no load. > > > > Let's look at your array example. At the high end, I'm allocating and > > tacking on a million one-byte buffers. Yet it runs quickly. > > > > So I'm still curious to know what determines how the stream created to > write > > to disk actually drains. That would be node code, right? I mean, not > part > > of V8. > > Matt is mostly right. Quoting your code: > > for (i=0; i<Math.pow(2,power); i++) { > ws.write('A'); > } > ws.end(); // write EOF > > Two things happen here that are expensive: > > 1. 2^n string to buffer conversions (the string 'A' is implicitly > converted to a buffer). > > 2. 2^n write requests are queued. Each request is sent to the thread > pool. The thread pool is a bottleneck because there are (usually) only > 4 worker threads[1][2]. File I/O is an area where Node still needs > some optimization. :-) > > On a side note: a best practice when writing to streams is to call > stream.write() until it returns false, then listen for the 'drain' > event before you start writing again. > > [1] To be clear, writes are ordered. Simply increasing the size of the > thread pool won't improve performance of a single WriteStream because > the writes need to be serialized anyway. But if the thread pool is > already running at full capacity, then your file I/O will suffer too. > > [2] dns.lookup() is affected too, it calls getaddrinfo() from the thread > pool. > > -- > Job Board: http://jobs.nodejs.org/ > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > You received this message because you are subscribed to the Google > Groups "nodejs" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/nodejs?hl=en?hl=en > -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
