Hi Matt,
You probably know better than me, but it's not obvious to me that these two
examples (both interesting) are especially similar. For one thing, your
example creates a new buffer on every iteration. My example leaves
allocation entirely to streams to decide when to buffer and in what size
chunks.
And you example runs fast. I modified your code to read test
test.js
--------
power = process.argv[2];
count = Math.pow(2,power);
var a = [];
for (i=0;i<count;i++) {
a.push(new Buffer('A'));
}
console.error(a.length);
and then I did this
$ for ((i=0; i<=20; i+=2)); do echo '----------'; time node test.js $i; done
and got this:
----------
1
real 0m0.409s
user 0m0.164s
sys 0m0.020s
----------
4
real 0m0.234s
user 0m0.156s
sys 0m0.020s
----------
16
real 0m0.234s
user 0m0.144s
sys 0m0.036s
----------
64
real 0m0.231s
user 0m0.152s
sys 0m0.024s
----------
256
real 0m0.232s
user 0m0.132s
sys 0m0.052s
----------
1024
real 0m0.232s
user 0m0.152s
sys 0m0.032s
----------
4096
real 0m0.257s
user 0m0.180s
sys 0m0.028s
----------
16384
real 0m0.362s
user 0m0.272s
sys 0m0.040s
----------
65536
real 0m0.605s
user 0m0.464s
sys 0m0.080s
----------
262144
real 0m1.689s
user 0m1.448s
sys 0m0.160s
----------
1048576
real 0m6.444s
user 0m5.840s
sys 0m0.448s
which is a couple orders of magnitude faster than my example for the same
upper limits of 2^20.
But remember, my example was not slow to stuff the stream. The slow part
was draining the stream to disk. Now that could be because the VFS was
pushing back (?) against lots of tiny writes, or it could be because (?)
node streams are backed with a small buffer and stuffing it forced node to
scavenge for memory to create a linked list of tiny buffer. If we look at
the onset of the scaling near 1 KB, we might imagine that stuffing with 1MB
would require ~1000 chunks to be scavenged. That does not seem like a big
job for malloc on a machine with 2GB and basically no load.
Let's look at your array example. At the high end, I'm allocating and
tacking on a million one-byte buffers. Yet it runs quickly.
So I'm still curious to know what determines how the stream created to
write to disk actually drains. That would be node code, right? I mean,
not part of V8.
Thanks!
On Tue, Mar 20, 2012 at 2:02 PM, Matt <[email protected]> wrote:
> Try this test instead, I bet it gives you the same performance
> characteristics:
>
> var a = [];
> for (i=0; i<Math.pow(2,power); i++) {
> a.push(new Buffer('A'));
> }
>
> Which is the same thing you're making Stream do. I bet all the overhead is
> in that.
>
> On Tue, Mar 20, 2012 at 4:00 PM, C. Mundi <[email protected]> wrote:
>
>> Yes. I am deliberately abusing the stream with the goal of understanding
>> behavior.
>>
>> I would like to understand why it drains so slowly to disk, on the order
>> of a few KB/sec. I would imagine that a stream created specifically with a
>> file as its drain would be implemented to dump the biggest chunks it could
>> on every write to the filesystem. Hence I wonder whether the low rate is
>> really a node stream behavior or a virtual filesystem (buffering)
>> behavior. I guess I also need to ask how nodes own streams are backed.
>>
>> I really want to understand just enough to predict bounds on scaling.
>> On Mar 20, 2012 12:30 PM, "Matt" <[email protected]> wrote:
>>
>>> You're not "streaming" data into the stream. You're pumping it full as
>>> quickly as possible and letting it drain in its own time. Obviously this is
>>> sub-optimal.
>>>
>>> On Tue, Mar 20, 2012 at 3:20 PM, C. Mundi <[email protected]> wrote:
>>>
>>>>
>>>> Hi. I am trying to learn how to use streams properly in node.
>>>>
>>>> The attached script stakes a single argv parameter N and writes 2^N
>>>> bytes to a file via a stream.
>>>>
>>>> I expect time behavior O(2^N). But what I am seeing is scaling faster.
>>>>
>>>> I collected data like this in bash on Linux 3.0.0.-16 x64:
>>>>
>>>> for i in {0..20}; do time node simple.js $i; done
>>>>
>>>> and plotted the results in the attached spreadhseet (OpenOffice format).
>>>>
>>>> Watching it execute, I see that almost all of the time for N>10 is
>>>> spent waiting for the stream to close. It seems to be draining very
>>>> slowly! The maximum memory use by node at any time is 180 MB, less than
>>>> 10% of physical memory. For N greater than about 10, node takes 100% CPU
>>>> while closing the stream. The system is otherwise not under load. The
>>>> system is not swapping much if at all.
>>>>
>>>> The straight lines in the plot are a guide to the eyes -- not a fit!
>>>> You can see that the time behavior is superlinear on the log-log plot.
>>>>
>>>> If I were using a lot of memory, I would expect the GC to kick in and
>>>> then all bets are off. But my biggest run (N=20) is only a megabyte.
>>>>
>>>> Could the pushback be coming from the Linux VFS? The scaling of the
>>>> system time makes me wonder.
>>>>
>>>> Please help me understand why I should not do what I've done and where
>>>> to learn about how to use streams efficiently in node.
>>>>
>>>> Note: The code is a demonstration. I would not typically create files
>>>> this way.
>>>>
>>>> Thanks for cluing the noob (again).
>>>>
>>>>
>>>> --
>>>> Job Board: http://jobs.nodejs.org/
>>>> Posting guidelines:
>>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>> You received this message because you are subscribed to the Google
>>>> Groups "nodejs" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>>
>>>
>>> --
>>> Job Board: http://jobs.nodejs.org/
>>> Posting guidelines:
>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>> You received this message because you are subscribed to the Google
>>> Groups "nodejs" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>
>> --
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines:
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
--
Job Board: http://jobs.nodejs.org/
Posting guidelines:
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en