Well, using buffer is not a good choice for my use case, because in my 
project, the bottle neck is not the data itself but the data index. I use 
v8 object as sort of in-memory db.
If I have to use buffer or manipulate index/hash my self, then I'm likely 
to do this in c++ or with redis. But probably not be able to have better or 
even equivalent performance over v8.

@back2dos your suggestion is in fact my first choice if it's unlikely to 
break through the heap limit. Though a logic level map reduce is not 
possible because my data structure is a net, but down to in-memory db 
layer, I can split data and load them in several v8 process. It's actually 
work without much penalty, and may even have performance gain when the cpu 
become bottle neck.

And another question, why v8 has this hard heap size limit?

By the way, during my test I found 2 things interesting ( and weird).

randomStringByLength = function(length) {
    var str = "";
    for (var i=0; i <= length;i++) {
        var offset = 97 + Math.floor(Math.random() * 26);
        str += String.fromCharCode(offset);
    }
    return str;
}
memberLength = 100 * 1000 * 1000;
console.time("total");
console.log("init big object of " + memberLength + ", with random string at 
length 32 as key and an object as value");
console.time("1/100");
var obj = {}
for (var index = 0;index<memberLength;index++) {
    if (index % (memberLength / 100) === 0) {
        console.timeEnd("1/100");
        console.time("1/100");
        console.log(index / (memberLength / 100));
    }
    obj[randomStringByLength(32)] = {};
}
console.log("build time:");
console.timeEnd("total");
//END

It will end at about 10M (10/100) empty object asigned to the object.(about 
1.9G mem use without surprise)

But look at the result.
init big object of 100000000, with random string at length 32 as key and an 
object as value
1/100: 0ms
0
1/100: 2981ms
1
1/100: 2647ms
2
1/100: 4254ms
3
1/100: 2348ms
4
1/100: 2393ms
5
1/100: 3880ms
6
1/100: 2488ms
7
1/100: 2552ms
8
1/100: 4740ms
9
1/100: 2673ms
10
1/100: 8279ms
11
FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory
There is the first interesting thing.
It's not a big surprise that the average time to add 1M member to an object 
has no significant growth when member count increase. V8 should use a hash 
table or something to optimize things.
But the time for each 1M add are quite stable.Say when adding the 4th 1M 
data to the object, it's always about 2200ms~2600ms. And on a slower 
machine, the average time scales at a certain rate.
The result hints that the time taken to add a value is affected by count of 
object member, but it's not growing by the count. It's more likely to be 
decide by the index of value you add, say I'm adding the 10000th value to 
object using a random 32byte key, then the time usage is fixed, no matter 
what previous added value's keys are. 
The hash strategy behind the scene must be interesting.

Then comes the second interesting thing, and weird.
When the value is null instead of empty object.
//obj[randomStringByLength(32)] = {};
obj[randomStringByLength(32)] = null;
Then the program become  extrodinary(200x) slow at 5M-6M but recovers at 7M.
Maybe nulling a member cause unwanted GC walk through? I'm just wonder.


On Wednesday, August 21, 2013 9:57:01 PM UTC+8, back2dos wrote:
>
> This may be a stupid question, but how about distributing the workload 
> to multiple node processes? 
>
> This should sidestep the memory barrier and also happens to leverage 
> all cores. IPC is pretty straight forward with node 
> (
> http://nodejs.org/api/child_process.html#child_process_child_send_message_sendhandle).
>  
>
> And should the amount of data ever grow to surpass the capabilities of 
> one machine, you can swap out the IPC for some network protocol and 
> run the stuff on multiple machines. 
>
> Without knowing more about the problem, I would suggest aiming for a 
> map-reduce-ish information flow between the distinct processes. Maybe 
> this is a good starting point: http://www.mapjs.org/ 
>
> Regards, 
> Juraj 
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to