Well, using buffer is not a good choice for my use case, because in my project, the bottle neck is not the data itself but the data index. I use v8 object as sort of in-memory db. If I have to use buffer or manipulate index/hash my self, then I'm likely to do this in c++ or with redis. But probably not be able to have better or even equivalent performance over v8.
@back2dos your suggestion is in fact my first choice if it's unlikely to break through the heap limit. Though a logic level map reduce is not possible because my data structure is a net, but down to in-memory db layer, I can split data and load them in several v8 process. It's actually work without much penalty, and may even have performance gain when the cpu become bottle neck. And another question, why v8 has this hard heap size limit? By the way, during my test I found 2 things interesting ( and weird). randomStringByLength = function(length) { var str = ""; for (var i=0; i <= length;i++) { var offset = 97 + Math.floor(Math.random() * 26); str += String.fromCharCode(offset); } return str; } memberLength = 100 * 1000 * 1000; console.time("total"); console.log("init big object of " + memberLength + ", with random string at length 32 as key and an object as value"); console.time("1/100"); var obj = {} for (var index = 0;index<memberLength;index++) { if (index % (memberLength / 100) === 0) { console.timeEnd("1/100"); console.time("1/100"); console.log(index / (memberLength / 100)); } obj[randomStringByLength(32)] = {}; } console.log("build time:"); console.timeEnd("total"); //END It will end at about 10M (10/100) empty object asigned to the object.(about 1.9G mem use without surprise) But look at the result. init big object of 100000000, with random string at length 32 as key and an object as value 1/100: 0ms 0 1/100: 2981ms 1 1/100: 2647ms 2 1/100: 4254ms 3 1/100: 2348ms 4 1/100: 2393ms 5 1/100: 3880ms 6 1/100: 2488ms 7 1/100: 2552ms 8 1/100: 4740ms 9 1/100: 2673ms 10 1/100: 8279ms 11 FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory There is the first interesting thing. It's not a big surprise that the average time to add 1M member to an object has no significant growth when member count increase. V8 should use a hash table or something to optimize things. But the time for each 1M add are quite stable.Say when adding the 4th 1M data to the object, it's always about 2200ms~2600ms. And on a slower machine, the average time scales at a certain rate. The result hints that the time taken to add a value is affected by count of object member, but it's not growing by the count. It's more likely to be decide by the index of value you add, say I'm adding the 10000th value to object using a random 32byte key, then the time usage is fixed, no matter what previous added value's keys are. The hash strategy behind the scene must be interesting. Then comes the second interesting thing, and weird. When the value is null instead of empty object. //obj[randomStringByLength(32)] = {}; obj[randomStringByLength(32)] = null; Then the program become extrodinary(200x) slow at 5M-6M but recovers at 7M. Maybe nulling a member cause unwanted GC walk through? I'm just wonder. On Wednesday, August 21, 2013 9:57:01 PM UTC+8, back2dos wrote: > > This may be a stupid question, but how about distributing the workload > to multiple node processes? > > This should sidestep the memory barrier and also happens to leverage > all cores. IPC is pretty straight forward with node > ( > http://nodejs.org/api/child_process.html#child_process_child_send_message_sendhandle). > > > And should the amount of data ever grow to surpass the capabilities of > one machine, you can swap out the IPC for some network protocol and > run the stuff on multiple machines. > > Without knowing more about the problem, I would suggest aiming for a > map-reduce-ish information flow between the distinct processes. Maybe > this is a good starting point: http://www.mapjs.org/ > > Regards, > Juraj > -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to nodejs@googlegroups.com To unsubscribe from this group, send email to nodejs+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.