When you get that big, 100k's of users I bet this article will apply.

http://news.cnet.com/8301-1009_3-57428067-83/fbi-we-need-wiretap-ready-web-sites-now/


What's harder, 1M users talking to each other, or logging and transmitting
those chats to a third party?

On Tue, May 8, 2012 at 5:36 PM, Micheil Smith <mich...@brandedcode.com>wrote:

> No worries, I am on twitter and github as "miksago".
>
> 1., Doing things pull based is a possibly new way of thinking about
> realtime communication,
> I haven't yet seen it proven, but I think it makes sense, means that if a
> server starts getting
> overloaded, it'd be able to throttle incoming load and not kill the rest
> of the servers in you
> cluster (situation: broadcast messages).
>
> 2., I don't think msgpack is a protocol (in the sense of the word I was
> meaning), internally,
> I would be using a more structured data format, such as Protobuf, which
> has a fairly strict
> declaration and parser of data. Msgpack is more akin to JSON, in that it's
> just a data format,
> not a data protocol, it's way you use it that makes it a protocol.
>
> The protocols I was talking of were WebSocket Sub-Protocols, and pretty
> specific to your
> application or domain.
>
> 3., I would be going with a max of 25-75K concurrrents per server in that
> case, which
> would mean 16 to 40 server processes. (Most likely you'd have that 16
> segmented as 4
> servers * 4 processes, assuming 4 cores). Essentially, you want to make
> the load not
> incredibly high on a single server, it's better to scale out horizontally
> a little bit more
> than you need, and then use the high watermark on the servers as being
> "burst capacity".
>
> That said, I would be surprised if anyone is really reach close to 500K
> concurrrents on a
> single application (that's a number I'd expect from a service provider of
> realtime).
>
> As for dealing with more servers, that's where something like Apache Kafka
> comes in,
> however, I'm still uncertain as to using kafka. You could also go the
> route of mesh
> networking with ZMQ, which does work fairly well, but the setup and
> development of
> it is more complex. So, every server would talk to every other server.
>
> You don't want to be using broadcast messages if possible. As in, if you
> go the pull
> based setup, then each server would have a mailbox per channel on your chat
> system, and servers would pull from only the servers and mailboxes that
> they are
> interested in. Just like if you go the route of central brokers (not that
> i recommend
> that), then you can structure you queues and their key spaces into segments
> representing say something like "chats:{CHAT ID}", or perhaps even
> "{PID / SERVER ID}:chats:{CHAT ID}", this would mean that servers would
> listen
> on only a subset of messages, and wouldn't get all the messages in the
> system.
>
> (hopefully that last part makes sense, I've a bit crammed for time to
> write it).
>
> – Micheil
>
> On 07/05/2012, at 12:52 PM, jason.桂林 wrote:
>
> > Thanks Micheil, what you said is very professional, do you have a
> twitter or G+ account, I want follow you, heh.
> >
> > 1. What you said pull base rather than pull base, looks like a new
>  thinking, but I can understand why you said this, I have thought lots
> about push base message broadcast, very complex. Maybe pull base will be
> very simple also beautiful solution.
> >
> > 2. You said transport protocol, I'd like to use msgpack as protocal, but
> I need help on the protocol, because msgpack is not compress on string, I
> am also afraid there are some security problem.
> >
> > 3. " I would recommend looking into using more servers with lower load
> versus fewer servers with higher load;  " I'd like talk more about this,
> >
> > we have to use more servers for scaling, but more servers means more
> complex, unlike other web applications, realtime service need communication
>  between servers, we have 1M users dispatched on 1K servers, 1K user on
> each server, 1 user send a message in a room, this message will send to
> others users. In worst case the server for sender and server for reciever
> on cover all 1K server, so this message will send to all 1K server.
> >
> > if 100 user(10%) on each 1K servers send worst message, each server will
> recieve 100K messages in same time, it's horrible.
> >
> > How to prevent this happen?
> >
> >
> > 2012/5/6 Micheil Smith <mich...@brandedcode.com>
> > If you have millions of users on line, I think you'll be facing other
> problems than just
> > Socket.io, some old-ish benchmarks showed socket.io maxing out at
> around 5-20K
> > concurrents in a single process, other websocket servers performed
> differently. If
> > you're serious about scaling realtime infrastructure, then you should
> probably have
> > a look at talks from Keeping It Realtime Conference (
> http://2011.krtconf.com/), as well
> > as looking into Autobahn Test Suite benchmarks.
> >
> > Things to be cautious of:
> >
> >    - You'll need a way to do load balancing (Traditional load balancers
> tend to fail
> >       pretty hard with WebSockets or persistent Connections)
> >
> >    - I would NOT recommend using redis or any other centralised message
> bus, this
> >      is by far the easiest way to do scaling across multiple servers,
> however, it's also
> >      the easiest way to shoot yourself in the foot if the message bus
> goes down
> >      (process crash, server network isolation, etc).
> >
> >    - I would recommend looking into using more servers with lower load
> versus fewer
> >      servers with higher load; This will enable you to scale much better
> in short bursts.
> >      (experience tells me that generally you'll find that your
> application or service will
> >      have peaks and troughs in usage, generally these match up well if
> the three main
> >      timezone blocks (US, GMT, and East Asian / Oceanic)
> >
> > Those points aside, getting above 100K concurrent users tends to be
> incredibly hard,
> > some of the largest apps around that I've seen have only just been
> pushing 250K (we're
> > talking like big service providers that have 500K -> 2M users, I can't
> name them due
> > to legal reasons).
> >
> > As for storage of data, you will most likely need both realtime
> communication between
> > servers as well as some sort of key/value store for things like presence
> information and
> > authentication tokens. For the storage of data, I would actually
> recommend redis, it tends
> > to scale out really well for master / slave type stuff. As for message
> communication, I'm
> > beginning to think that pull-based may be better than push based, so
> something like
> > Apache Kafka (not that I've had personal experience with it.)
> >
> > You will most likely want to also define a transport protocol on top of
> your connection,
> > dependent on your type of application, there aren't many resources on
> doing this, but
> > if you want help with that, give me a shout, I've done a lot of research
> into that area over
> > the last two years.
> >
> > Alternatively, you could look at third party services for scaling your
> realtime architecture.
> > At present, given the information I have on various services, I would be
> inclined to
> > recommend PubNub (http://pubnub.com), they appear to have a very high
> quality setup.
> > (disclaimer, I did work for a competitor in the past, but that does not
> bias my choice,
> > another option is Pusher (http://pusher.com), or for more, you can look
> here:
> > http://www.leggetter.co.uk/real-time-web-technologies-guide )
> >
> > Hopefully this gives some useful information or things to think about.
> Scaling realtime
> > architecture is kind of hard (not impossible, but can be a pain in the
> ass).
> >
> > Regards,
> > Micheil Smith
> > --
> > BrandedCode.com
> >
> > On 06/05/2012, at 4:26 PM, jason.桂林 wrote:
> >
> > > Thanks Roly, it's very useful for single machine app.
> > >
> > > I have a real app question. If we have millions of online user, how to
> computer system capacity, and how to design a architecture to fit the
> capacity?
> > >
> > >
> > >
> > > 2012/5/6 Roly Fentanes <roly...@gmail.com>
> > > https://github.com/fent/socket.io-clusterhub
> > >
> > >
> > > On Sunday, May 6, 2012 4:04:30 AM UTC-7, Jason.桂林(Gui Lin) wrote:
> > > I just join hackthon party, our team made a very cool chat web
> application in 24 hours.
> > >
> > > But I know, it is a demo, It use socket.io, redis, I think it is a
> little expensive on session. and it can't communicate with processes it
> make it cluster.
> > >
> > > What nodejs could be use to? frontend server? core internal server?
> > >
> > > Some body said ZMQ is very fast message queue, is it help with this
> case?
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > 桂林 (Gui Lin)
> > >
> > > guileen@twitter
> > > 桂林-V@weibo
> > > guileen@github
> > >
> > >
> > > --
> > > Job Board: http://jobs.nodejs.org/
> > > Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> > > You received this message because you are subscribed to the Google
> > > Groups "nodejs" group.
> > > To post to this group, send email to nodejs@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > nodejs+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/nodejs?hl=en?hl=en
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > 桂林 (Gui Lin)
> > >
> > > guileen@twitter
> > > 桂林-V@weibo
> > > guileen@github
> > >
> > >
> > > --
> > > Job Board: http://jobs.nodejs.org/
> > > Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> > > You received this message because you are subscribed to the Google
> > > Groups "nodejs" group.
> > > To post to this group, send email to nodejs@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > nodejs+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/nodejs?hl=en?hl=en
> >
> > --
> > Job Board: http://jobs.nodejs.org/
> > Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> > You received this message because you are subscribed to the Google
> > Groups "nodejs" group.
> > To post to this group, send email to nodejs@googlegroups.com
> > To unsubscribe from this group, send email to
> > nodejs+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/nodejs?hl=en?hl=en
> >
> >
> >
> > --
> > Best regards,
> >
> > 桂林 (Gui Lin)
> >
> > guileen@twitter
> > 桂林-V@weibo
> > guileen@github
> >
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nodejs@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to