No worries, I am on twitter and github as "miksago".

1., Doing things pull based is a possibly new way of thinking about realtime 
communication, 
I haven't yet seen it proven, but I think it makes sense, means that if a 
server starts getting 
overloaded, it'd be able to throttle incoming load and not kill the rest of the 
servers in you 
cluster (situation: broadcast messages).

2., I don't think msgpack is a protocol (in the sense of the word I was 
meaning), internally, 
I would be using a more structured data format, such as Protobuf, which has a 
fairly strict 
declaration and parser of data. Msgpack is more akin to JSON, in that it's just 
a data format, 
not a data protocol, it's way you use it that makes it a protocol.

The protocols I was talking of were WebSocket Sub-Protocols, and pretty 
specific to your 
application or domain.

3., I would be going with a max of 25-75K concurrrents per server in that case, 
which 
would mean 16 to 40 server processes. (Most likely you'd have that 16 segmented 
as 4 
servers * 4 processes, assuming 4 cores). Essentially, you want to make the 
load not 
incredibly high on a single server, it's better to scale out horizontally a 
little bit more 
than you need, and then use the high watermark on the servers as being "burst 
capacity".

That said, I would be surprised if anyone is really reach close to 500K 
concurrrents on a 
single application (that's a number I'd expect from a service provider of 
realtime).

As for dealing with more servers, that's where something like Apache Kafka 
comes in, 
however, I'm still uncertain as to using kafka. You could also go the route of 
mesh 
networking with ZMQ, which does work fairly well, but the setup and development 
of 
it is more complex. So, every server would talk to every other server.

You don't want to be using broadcast messages if possible. As in, if you go the 
pull 
based setup, then each server would have a mailbox per channel on your chat 
system, and servers would pull from only the servers and mailboxes that they 
are 
interested in. Just like if you go the route of central brokers (not that i 
recommend 
that), then you can structure you queues and their key spaces into segments 
representing say something like "chats:{CHAT ID}", or perhaps even 
"{PID / SERVER ID}:chats:{CHAT ID}", this would mean that servers would listen 
on only a subset of messages, and wouldn't get all the messages in the system.

(hopefully that last part makes sense, I've a bit crammed for time to write it).

– Micheil

On 07/05/2012, at 12:52 PM, jason.桂林 wrote:

> Thanks Micheil, what you said is very professional, do you have a twitter or 
> G+ account, I want follow you, heh.
> 
> 1. What you said pull base rather than pull base, looks like a new  thinking, 
> but I can understand why you said this, I have thought lots about push base 
> message broadcast, very complex. Maybe pull base will be very simple also 
> beautiful solution.  
> 
> 2. You said transport protocol, I'd like to use msgpack as protocal, but I 
> need help on the protocol, because msgpack is not compress on string, I am 
> also afraid there are some security problem.
> 
> 3. " I would recommend looking into using more servers with lower load versus 
> fewer servers with higher load;  " I'd like talk more about this, 
> 
> we have to use more servers for scaling, but more servers means more complex, 
> unlike other web applications, realtime service need communication  between 
> servers, we have 1M users dispatched on 1K servers, 1K user on each server, 1 
> user send a message in a room, this message will send to others users. In 
> worst case the server for sender and server for reciever on cover all 1K 
> server, so this message will send to all 1K server.
> 
> if 100 user(10%) on each 1K servers send worst message, each server will 
> recieve 100K messages in same time, it's horrible.
> 
> How to prevent this happen?
> 
> 
> 2012/5/6 Micheil Smith <mich...@brandedcode.com>
> If you have millions of users on line, I think you'll be facing other 
> problems than just
> Socket.io, some old-ish benchmarks showed socket.io maxing out at around 5-20K
> concurrents in a single process, other websocket servers performed 
> differently. If
> you're serious about scaling realtime infrastructure, then you should 
> probably have
> a look at talks from Keeping It Realtime Conference 
> (http://2011.krtconf.com/), as well
> as looking into Autobahn Test Suite benchmarks.
> 
> Things to be cautious of:
> 
>    - You'll need a way to do load balancing (Traditional load balancers tend 
> to fail
>       pretty hard with WebSockets or persistent Connections)
> 
>    - I would NOT recommend using redis or any other centralised message bus, 
> this
>      is by far the easiest way to do scaling across multiple servers, 
> however, it's also
>      the easiest way to shoot yourself in the foot if the message bus goes 
> down
>      (process crash, server network isolation, etc).
> 
>    - I would recommend looking into using more servers with lower load versus 
> fewer
>      servers with higher load; This will enable you to scale much better in 
> short bursts.
>      (experience tells me that generally you'll find that your application or 
> service will
>      have peaks and troughs in usage, generally these match up well if the 
> three main
>      timezone blocks (US, GMT, and East Asian / Oceanic)
> 
> Those points aside, getting above 100K concurrent users tends to be 
> incredibly hard,
> some of the largest apps around that I've seen have only just been pushing 
> 250K (we're
> talking like big service providers that have 500K -> 2M users, I can't name 
> them due
> to legal reasons).
> 
> As for storage of data, you will most likely need both realtime communication 
> between
> servers as well as some sort of key/value store for things like presence 
> information and
> authentication tokens. For the storage of data, I would actually recommend 
> redis, it tends
> to scale out really well for master / slave type stuff. As for message 
> communication, I'm
> beginning to think that pull-based may be better than push based, so 
> something like
> Apache Kafka (not that I've had personal experience with it.)
> 
> You will most likely want to also define a transport protocol on top of your 
> connection,
> dependent on your type of application, there aren't many resources on doing 
> this, but
> if you want help with that, give me a shout, I've done a lot of research into 
> that area over
> the last two years.
> 
> Alternatively, you could look at third party services for scaling your 
> realtime architecture.
> At present, given the information I have on various services, I would be 
> inclined to
> recommend PubNub (http://pubnub.com), they appear to have a very high quality 
> setup.
> (disclaimer, I did work for a competitor in the past, but that does not bias 
> my choice,
> another option is Pusher (http://pusher.com), or for more, you can look here:
> http://www.leggetter.co.uk/real-time-web-technologies-guide )
> 
> Hopefully this gives some useful information or things to think about. 
> Scaling realtime
> architecture is kind of hard (not impossible, but can be a pain in the ass).
> 
> Regards,
> Micheil Smith
> --
> BrandedCode.com
> 
> On 06/05/2012, at 4:26 PM, jason.桂林 wrote:
> 
> > Thanks Roly, it's very useful for single machine app.
> >
> > I have a real app question. If we have millions of online user, how to 
> > computer system capacity, and how to design a architecture to fit the 
> > capacity?
> >
> >
> >
> > 2012/5/6 Roly Fentanes <roly...@gmail.com>
> > https://github.com/fent/socket.io-clusterhub
> >
> >
> > On Sunday, May 6, 2012 4:04:30 AM UTC-7, Jason.桂林(Gui Lin) wrote:
> > I just join hackthon party, our team made a very cool chat web application 
> > in 24 hours.
> >
> > But I know, it is a demo, It use socket.io, redis, I think it is a little 
> > expensive on session. and it can't communicate with processes it make it 
> > cluster.
> >
> > What nodejs could be use to? frontend server? core internal server?
> >
> > Some body said ZMQ is very fast message queue, is it help with this case?
> >
> >
> >
> > --
> > Best regards,
> >
> > 桂林 (Gui Lin)
> >
> > guileen@twitter
> > 桂林-V@weibo
> > guileen@github
> >
> >
> > --
> > Job Board: http://jobs.nodejs.org/
> > Posting guidelines: 
> > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> > You received this message because you are subscribed to the Google
> > Groups "nodejs" group.
> > To post to this group, send email to nodejs@googlegroups.com
> > To unsubscribe from this group, send email to
> > nodejs+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/nodejs?hl=en?hl=en
> >
> >
> >
> > --
> > Best regards,
> >
> > 桂林 (Gui Lin)
> >
> > guileen@twitter
> > 桂林-V@weibo
> > guileen@github
> >
> >
> > --
> > Job Board: http://jobs.nodejs.org/
> > Posting guidelines: 
> > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> > You received this message because you are subscribed to the Google
> > Groups "nodejs" group.
> > To post to this group, send email to nodejs@googlegroups.com
> > To unsubscribe from this group, send email to
> > nodejs+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/nodejs?hl=en?hl=en
> 
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nodejs@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
> 
> 
> 
> -- 
> Best regards,
> 
> 桂林 (Gui Lin)
> 
> guileen@twitter
> 桂林-V@weibo
> guileen@github
> 

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to